Akida models API

Quantization blocks

conv_block

akida_models.quantization_blocks.conv_block(inputs, filters, kernel_size, weight_quantization=0, activ_quantization=0, pooling=None, pool_size=2, 2, add_batchnorm=False, **kwargs)

Adds a quantized convolutional layer with optional layers in the following order: max pooling, batch normalization, quantized activation.

Parameters
  • inputs (tf.Tensor) – input tensor of shape (rows, cols, channels)

  • filters (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).

  • kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution kernel. Can be a single integer to specify the same value for all spatial dimensions.

  • weight_quantization (int) – quantization bitwidth for weights (usually 2, 4 or 8). For float weights (no quantization), set the value to zero.

  • activ_quantization (int) – (usually 1, 2 or 4). For a float activation (ReLU 6), set the value to zero. For no activation, set it to None.

  • pooling (str) – add a pooling layer of type ‘pooling’ among the values ‘max’, ‘avg’, ‘global_max’ or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.

  • pool_size (int or tuple of 2 integers) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

  • add_batchnorm (bool) – add a tf.keras.BatchNormalization layer

  • **kwargs – arguments passed to the tf.keras.Conv2D layer, such as strides, padding, use_bias, weight_regularizer, etc.

Returns

output tensor of conv2D block.

Return type

tf.Tensor

separable_conv_block

akida_models.quantization_blocks.separable_conv_block(inputs, filters, kernel_size, weight_quantization=0, activ_quantization=0, pooling=None, pool_size=2, 2, add_batchnorm=False, **kwargs)

Adds a quantized separable convolutional layer with optional layers in the following order: global average pooling, max pooling, batch normalization, quantized activation.

Parameters
  • inputs (tf.Tensor) – input tensor of shape (height, width, channels)

  • filters (int) – the dimensionality of the output space (i.e. the number of output filters in the pointwise convolution).

  • kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

  • weight_quantization (int) – quantization bitwidth for weights (usually 2, 4 or 8). For float weights (no quantization), set the value to zero.

  • activ_quantization (int) – quantization bitwidth for activation (usually 1, 2 or 4). For a float activation (ReLU 6), set the value to zero. For no activation, set it to None.

  • pooling (str) – add a pooling layer of type ‘pooling’ among the values ‘max’, ‘avg’, ‘global_max’ or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.

  • pool_size (int or tuple of 2 integers) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

  • add_batchnorm (bool) – add a tf.keras.BatchNormalization layer

  • **kwargs – arguments passed to the tf.keras.SeparableConv2D layer, such as strides, padding, use_bias, etc.

Returns

output tensor of separable conv block.

Return type

tf.Tensor

dense_block

akida_models.quantization_blocks.dense_block(inputs, units, weight_quantization=0, activ_quantization=0, add_batchnorm=False, **kwargs)

Adds a quantized dense layer with optional layers in the following order: batch normalization, quantized activation.

Parameters
  • inputs (tf.Tensor) – Input tensor of shape (rows, cols, channels)

  • units (int) – dimensionality of the output space

  • weight_quantization (int) – quantization bitwidth for weights (usually 2, 4 or 8). For float weights (no quantization), set the value to zero.

  • activ_quantization (int) – quantization bitwidth for activation (usually 1, 2 or 4). For a float activation (ReLU 6), set the value to zero. For no activation, set it to None.

  • add_batchnorm (bool) – add a tf.keras.BatchNormalization layer

  • **kwargs – arguments passed to the tf.keras.Dense layer, such as use_bias, kernel_initializer, weight_regularizer, etc.

Returns

output tensor of the dense block.

Return type

tf.Tensor

Layer blocks

conv_block

akida_models.layer_blocks.conv_block(inputs, filters, kernel_size, pooling=None, pool_size=2, 2, add_batchnorm=False, add_activation=True, **kwargs)

Adds a convolutional layer with optional layers in the following order: max pooling, batch normalization, activation.

Parameters
  • inputs (tf.Tensor) – input tensor of shape (rows, cols, channels)

  • filters (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).

  • kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution kernel. Can be a single integer to specify the same value for all spatial dimensions.

  • pooling (str) – add a pooling layer of type ‘pooling’ among the values ‘max’, ‘avg’, ‘global_max’ or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.

  • pool_size (int or tuple of 2 integers) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

  • add_batchnorm (bool) – add a BatchNormalization layer

  • add_activation (bool) – add a ReLU layer

  • **kwargs – arguments passed to the tf.keras.Conv2D layer, such as strides, padding, use_bias, weight_regularizer, etc.

Returns

output tensor of conv2D block.

Return type

tf.Tensor

separable_conv_block

akida_models.layer_blocks.separable_conv_block(inputs, filters, kernel_size, pooling=None, pool_size=2, 2, add_batchnorm=False, add_activation=True, **kwargs)

Adds a separable convolutional layer with optional layers in the following order: global average pooling, max pooling, batch normalization, activation.

Parameters
  • inputs (tf.Tensor) – input tensor of shape (height, width, channels)

  • filters (int) – the dimensionality of the output space (i.e. the number of output filters in the pointwise convolution).

  • kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

  • pooling (str) – add a pooling layer of type ‘pooling’ among the values ‘max’, ‘avg’, ‘global_max’ or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.

  • pool_size (int or tuple of 2 integers) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

  • add_batchnorm (bool) – add a BatchNormalization layer

  • add_activation (bool) – add a ReLU layer

  • **kwargs – arguments passed to the tf.keras.SeparableConv2D layer, such as strides, padding, use_bias, etc.

Returns

output tensor of separable conv block.

Return type

tf.Tensor

dense_block

akida_models.layer_blocks.dense_block(inputs, units, add_batchnorm=False, add_activation=True, **kwargs)

Adds a dense layer with optional layers in the following order: batch normalization, activation.

Parameters
  • inputs (tf.Tensor) – Input tensor of shape (rows, cols, channels)

  • units (int) – dimensionality of the output space

  • add_batchnorm (bool) – add a BatchNormalization layer

  • add_activation (bool) – add a ReLU layer

  • **kwargs – arguments passed to the Dense layer, such as use_bias, kernel_initializer, weight_regularizer, etc.

Returns

output tensor of the dense block.

Return type

tf.Tensor

Model zoo

Mobilenet

ImageNet

akida_models.mobilenet_imagenet(input_shape=None, alpha=1.0, dropout=0.001, include_top=True, pooling=None, classes=1000, weight_quantization=0, activ_quantization=0, input_weight_quantization=None)

Instantiates the MobileNet architecture.

Parameters
  • input_shape (tuple) – optional shape tuple.

  • alpha (float) –

    controls the width of the model.

    • If alpha < 1.0, proportionally decreases the number of filters in each layer.

    • If alpha > 1.0, proportionally increases the number of filters in each layer.

    • If alpha = 1, default number of filters from the paper are used at each layer.

  • dropout (float) – dropout rate

  • include_top (bool) – whether to include the fully-connected layer at the top of the model.

  • pooling (str) –

    Optional pooling mode for feature extraction when include_top is False.

    • None means that the output of the model will be the 4D tensor output of the last convolutional block.

    • avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.

  • classes (int) – optional number of classes to classify images into, only to be specified if include_top is True.

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization

    sets all activations in the model to have a particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • input_weight_quantization

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

A Keras model instance.

Raises

ValueError – in case of invalid argument for weights, or invalid input shape.

akida_models.mobilenet_imagenet_pretrained(alpha=1.0)

Helper method to retrieve a mobilenet_imagenet model that was trained on ImageNet dataset.

Parameters

alpha (float) – width of the model.

Returns

a Keras Model instance.

Return type

tf.keras.Model

akida_models.mobilenet_cats_vs_dogs_pretrained()

Helper method to retrieve a mobilenet_imagenet model that was trained on Cats vs.Dogs dataset.

Returns

a Keras Model instance.

Return type

tf.keras.Model

Preprocessing
akida_models.imagenet.preprocessing.process_record_dataset(dataset, is_training, batch_size, im_size, shuffle_buffer, parse_record_fn, num_epochs=1, dtype=tf.float32, datasets_num_private_threads=None, drop_remainder=False, tf_data_experimental_slack=False)

Given a Dataset with raw records, return an iterator over the records.

Parameters
  • dataset – A Dataset representing raw records

  • is_training – A boolean denoting whether the input is for training.

  • batch_size – The number of samples per batch.

  • shuffle_buffer – The buffer size to use when shuffling records. A larger value results in better randomness, but smaller values reduce startup time and use less memory.

  • parse_record_fn – A function that takes a raw record and returns the corresponding (image, label) pair.

  • num_epochs – The number of epochs to repeat the dataset.

  • dtype – Data type to use for images/features.

  • datasets_num_private_threads – Number of threads for a private threadpool created for all datasets computation.

  • drop_remainder – A boolean indicates whether to drop the remainder of the batches. If True, the batch dimension will be static.

  • tf_data_experimental_slack – Whether to enable tf.data’s experimental_slack option.

Returns

Dataset of (image, label) pairs ready for iteration.

akida_models.imagenet.preprocessing.get_filenames(is_training, data_dir)

Return filenames for dataset.

akida_models.imagenet.preprocessing.parse_record(raw_record, im_size, is_training, dtype)

Parses a record containing a training example of an image.

The input record is parsed into a label and image, and the image is passed through preprocessing steps (cropping, flipping, and so on).

Parameters
  • raw_record – scalar Tensor tf.string containing a serialized Example protocol buffer.

  • is_training – A boolean denoting whether the input is for training.

  • dtype – data type to use for images/features.

Returns

Tuple with processed image tensor and one-hot-encoded label tensor.

akida_models.imagenet.preprocessing.input_fn(is_training, data_dir, batch_size, im_size, num_epochs=1, dtype=tf.float32, datasets_num_private_threads=None, parse_record_fn=<function parse_record>, input_context=None, drop_remainder=False, tf_data_experimental_slack=False, training_dataset_cache=False)

Input function which provides batches for train or eval.

Parameters
  • is_training – A boolean denoting whether the input is for training.

  • data_dir – The directory containing the input data.

  • batch_size – The number of samples per batch.

  • num_epochs – The number of epochs to repeat the dataset.

  • dtype – Data type to use for images/features

  • datasets_num_private_threads – Number of private threads for tf.data.

  • parse_record_fn – Function to use for parsing the records.

  • input_context – A tf.distribute.InputContext object passed in by tf.distribute.Strategy.

  • drop_remainder – A boolean indicates whether to drop the remainder of the batches. If True, the batch dimension will be static.

  • tf_data_experimental_slack – Whether to enable tf.data’s experimental_slack option.

  • training_dataset_cache – Whether to cache the training dataset on workers. Typically used to improve training performance when training data is in remote storage and can fit into worker memory.

Returns

A dataset that can be used for iteration.

akida_models.imagenet.preprocessing.preprocess_image(image_buffer, bbox, output_height, output_width, num_channels, is_training=False, alpha=128.0, beta=128.0)

Preprocesses the given image.

Preprocessing includes decoding, cropping, and resizing for both training and eval images. Training preprocessing, however, introduces some random distortion of the image to improve accuracy.

Parameters
  • image_buffer – scalar string Tensor representing the raw JPEG image buffer.

  • bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].

  • output_height – The height of the image after preprocessing.

  • output_width – The width of the image after preprocessing.

  • num_channels – Integer depth of the image buffer for decoding.

  • is_trainingTrue if we’re preprocessing the image for training and False otherwise.

Returns

A preprocessed image.

akida_models.imagenet.preprocessing.index_to_label(index)

Function to get an ImageNet label from an index.

Parameters

index – between 0 and 999

Returns

a string of coma separated labels

akida_models.imagenet.preprocessing.resize_and_crop(image_buffer, output_height, output_width, num_channels)

Resize and crop the given image.

Parameters
  • image_buffer – scalar string Tensor representing the raw JPEG image buffer.

  • output_height – The height of the image after preprocessing.

  • output_width – The width of the image after preprocessing.

  • num_channels – Integer depth of the image buffer for decoding.

Returns

A resized and cropped image as a numpy array in uint8.

DS-CNN

CIFAR-10

akida_models.ds_cnn_cifar10(input_shape=32, 32, 3, classes=10, weight_quantization=0, activ_quantization=0, input_weight_quantization=None)

Instantiates a MobileNet-like model for the “Cifar-10” example. This model is based on the MobileNet architecture, mainly with fewer layers. The weights and activations are quantized such that it can be converted into an Akida model.

This architecture is originated from https://arxiv.org/abs/1704.04861 and inspired from https://arxiv.org/pdf/1711.07128.pdf.

Parameters
  • input_shape (tuple) – input shape tuple of the model

  • classes (int) – number of classes to classify images into

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization (int) –

    sets all activations in the model to have a. particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • input_weight_quantization (int) –

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’None’ implements the same bitwidth as the other weights.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

a quantized Keras model for DS-CNN/cifar10

Return type

tf.keras.Model

akida_models.ds_cnn_cifar10_pretrained()

Helper method to retrieve a ds_cnn_cifar10 model that was trained on CIFAR10 dataset.

Returns

a Keras Model instance.

Return type

tf.keras.Model

KWS

akida_models.ds_cnn_kws(input_shape=49, 10, 1, classes=33, include_top=True, weight_quantization=0, activ_quantization=0, input_weight_quantization=None, last_layer_activ_quantization=None)

Instantiates a MobileNet-like model for the “Keyword Spotting” example.

This model is based on the MobileNet architecture, mainly with fewer layers. The weights and activations are quantized such that it can be converted into an Akida model.

This architecture is originated from https://arxiv.org/pdf/1711.07128.pdf and was created for the “Keyword Spotting” (KWS) or “Speech Commands” dataset.

Parameters
  • input_shape (tuple) – input shape tuple of the model

  • classes (int) – optional number of classes to classify words into, only be specified if include_top is True.

  • include_top (bool) – whether to include the fully-connected layer at the top of the model.

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization (int) –

    sets all activations in the model to have a particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’1’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • input_weight_quantization (int) –

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’None’ implements the same bitwidth as the other weights.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • last_layer_activ_quantization (int) –

    sets activation quantization in the layer before the last. Defaults to activ_quantization value.

    • ’None’ implements the same bitwidth as the other activations.

    • ’0’ implements floating point 32-bit activations.

    • ’1’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

tf.keras.Model: a quantized Keras model for MobileNet/KWS

akida_models.ds_cnn_kws_pretrained()

Helper method to retrieve a ds_cnn_kws model that was trained on KWS dataset.

Returns

a Keras Model instance.

Return type

tf.keras.Model

Preprocessing
akida_models.kws.preprocessing.prepare_model_settings(sample_rate, clip_duration_ms, window_size_ms, window_stride_ms, feature_bin_count)

Calculates common settings needed for all models.

Parameters
  • sample_rate – Number of audio samples per second.

  • clip_duration_ms – Length of each audio clip to be analyzed.

  • window_size_ms – Duration of frequency analysis window.

  • window_stride_ms – How far to move in time between frequency windows.

  • feature_bin_count – Number of frequency bins to use for analysis.

Returns

Dictionary containing common settings.

Raises

ValueError – If the preprocessing mode isn’t recognized.

akida_models.kws.preprocessing.prepare_words_list(wanted_words)

Prepends common tokens to the custom word list.

Parameters

wanted_words – List of strings containing the custom words.

Returns

List with the standard silence and unknown tokens added.

akida_models.kws.preprocessing.which_set(filename, validation_percentage, testing_percentage)

Determines which data partition the file should belong to.

We want to keep files in the same training, validation, or testing sets even if new ones are added over time. This makes it less likely that testing samples will accidentally be reused in training when long runs are restarted for example. To keep this stability, a hash of the filename is taken and used to determine which set it should belong to. This determination only depends on the name and the set proportions, so it won’t change as other files are added.

It’s also useful to associate particular files as related (for example words spoken by the same person), so anything after ‘_nohash_’ in a filename is ignored for set determination. This ensures that ‘bobby_nohash_0.wav’ and ‘bobby_nohash_1.wav’ are always in the same set, for example.

Parameters
  • filename – File path of the data sample.

  • validation_percentage – How much of the data set to use for validation.

  • testing_percentage – How much of the data set to use for testing.

Returns

String, one of ‘training’, ‘validation’, or ‘testing’.

class akida_models.kws.preprocessing.AudioProcessor(sample_rate, clip_duration_ms, window_size_ms, window_stride_ms, feature_bin_count, data_url=None, data_dir=None, silence_percentage=0, unknown_percentage=0, wanted_words=None, validation_percentage=0, testing_percentage=0)

Handles loading, partitioning, and preparing audio training data.

Methods:

get_augmented_data_for_wav(wav_filename, …)

Applies the feature transformation process to a wav audio file, adding data augmentation (background noise and time shifting).

get_data(how_many, offset, …)

Gather samples from the data set, applying transformations as needed.

get_features_for_wav(wav_filename)

Applies the feature transformation process to the input_wav.

maybe_download_and_extract_dataset(data_url, …)

Download and extract data set tar file.

prepare_background_data()

Searches a folder for background noise audio, and loads it into memory.

prepare_data_index(silence_percentage, …)

Prepares a list of the samples organized by set and label.

prepare_processing_graph()

Builds a TensorFlow graph to apply the input distortions.

get_augmented_data_for_wav(wav_filename, background_frequency, background_volume_range, time_shift, num_augmented_samples=1)

Applies the feature transformation process to a wav audio file, adding data augmentation (background noise and time shifting).

Parameters
  • wav_filename (str) – The path to the input audio file.

  • background_frequency – How many clips will have background noise, 0.0 to 1.0.

  • background_volume_range – How loud the background noise will be.

  • time_shift – How much to randomly shift the clips by in time.

  • num_augmented_samples – How many samples will be generated using data augmentation.

Returns

Numpy data array containing the generated features for every augmented

sample.

get_data(how_many, offset, background_frequency, background_volume_range, time_shift, mode)

Gather samples from the data set, applying transformations as needed.

When the mode is ‘training’, a random selection of samples will be returned, otherwise the first N clips in the partition will be used. This ensures that validation always uses the same samples, reducing noise in the metrics.

Parameters
  • how_many – Desired number of samples to return. -1 means the entire contents of this partition.

  • offset – Where to start when fetching deterministically.

  • background_frequency – How many clips will have background noise, 0.0 to 1.0.

  • background_volume_range – How loud the background noise will be.

  • time_shift – How much to randomly shift the clips by in time.

  • mode – Which partition to use, must be ‘training’, ‘validation’, or ‘testing’.

Returns

List of sample data for the transformed samples, and list of label indexes

Raises

ValueError – If background samples are too short.

get_features_for_wav(wav_filename)

Applies the feature transformation process to the input_wav.

Runs the feature generation process (generally producing a spectrogram from the input samples) on the WAV file. This can be useful for testing and verifying implementations being run on other platforms.

Parameters

wav_filename – The path to the input audio file.

Returns

Numpy data array containing the generated features.

maybe_download_and_extract_dataset(data_url, dest_directory)

Download and extract data set tar file.

If the data set we’re using doesn’t already exist, this function downloads it from the TensorFlow.org website and unpacks it into a directory. If the data_url is none, don’t download anything and expect the data directory to contain the correct files already.

Parameters
  • data_url – Web location of the tar file containing the data set.

  • dest_directory – File path to extract data to.

prepare_background_data()

Searches a folder for background noise audio, and loads it into memory.

It’s expected that the background audio samples will be in a subdirectory named ‘_background_noise_’ inside the ‘data_dir’ folder, as .wavs that match the sample rate of the training data, but can be much longer in duration.

If the ‘_background_noise_’ folder doesn’t exist at all, this isn’t an error, it’s just taken to mean that no background noise augmentation should be used. If the folder does exist, but it’s empty, that’s treated as an error.

Returns

List of raw PCM-encoded audio samples of background noise.

Raises

Exception – If files aren’t found in the folder.

prepare_data_index(silence_percentage, unknown_percentage, wanted_words, validation_percentage, testing_percentage)

Prepares a list of the samples organized by set and label.

The training loop needs a list of all the available data, organized by which partition it should belong to, and with ground truth labels attached. This function analyzes the folders below the data_dir, figures out the right labels for each file based on the name of the subdirectory it belongs to, and uses a stable hash to assign it to a data set partition.

Parameters
  • silence_percentage – How much of the resulting data should be background.

  • unknown_percentage – How much should be audio outside the wanted classes.

  • wanted_words – Labels of the classes we want to be able to recognize.

  • validation_percentage – How much of the data set to use for validation.

  • testing_percentage – How much of the data set to use for testing.

Returns

Dictionary containing a list of file information for each set partition, and a lookup map for each class to determine its numeric index.

Raises

Exception – If expected files are not found.

prepare_processing_graph()

Builds a TensorFlow graph to apply the input distortions.

Creates a graph that loads a WAVE file, decodes it, scales the volume, shifts it in time, adds in background noise, calculates a spectrogram, and then builds an MFCC fingerprint from that.

VGG

CIFAR-10

akida_models.vgg_cifar10(input_shape=32, 32, 3, classes=10, weight_quantization=0, activ_quantization=0, input_weight_quantization=None)

Instantiates a vgg-like model for the “Cifar-10” example. This model is based on the vgg architecture, mainly with fewer layers. The weights and activations are quantized such that it can be converted into an Akida model. This architecture is inspired by vgg.

Parameters
  • input_shape (tuple) – input shape tuple of the model

  • classes (int) – number of classes to classify images into

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization (int) –

    sets all activations in the model to have a. particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • input_weight_quantization (int) –

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’None’ implements the same bitwidth as the other weights.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

a quantized Keras model for vgg/cifar10

Return type

tf.keras.Model

akida_models.vgg_cifar10_pretrained()

Helper method to retrieve a vgg_cifar10 model that was trained on CIFAR10 dataset.

Returns

a Keras Model instance.

Return type

tf.keras.Model

UTK Face

akida_models.vgg_utk_face(input_shape=32, 32, 3, weight_quantization=0, activ_quantization=0, input_weight_quantization=None)

Instantiates a VGG-like model for the regression example on age estimation using UTKFace dataset.

Parameters
  • input_shape (tuple) – input shape tuple of the model

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization (int) –

    sets all activations in the model to have a particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’1’ through ‘8’ implements n-bit weights where n is from 1-8 bits.

  • input_weight_quantization (int) –

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’None’ implements the same bitwidth as the other weights.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

a quantized Keras model for VGG/UTKFace

Return type

tf.keras.Model

akida_models.vgg_utk_face_pretrained()

Helper method to retrieve a vgg_utk_face model that was trained on UTK Face dataset.

Returns

a Keras Model instance.

Return type

tf.keras.Model

Preprocessing
akida_models.utk_face.preprocessing.load_data()

Loads the dataset from Brainchip data server.

Returns

train set, train labels, test set and test labels as numpy arrays

Return type

np.array, np.array, np.array, np.array

YOLO

akida_models.yolo_base(input_shape=224, 224, 3, classes=1, nb_box=5, grid_size=7, 7, alpha=1.0, dropout=0.001, weight_quantization=0, activ_quantization=0, input_weight_quantization=None)

Instantiates the YOLOv2 architecture.

Parameters
  • input_shape (tuple) – input shape tuple

  • classes (int) – number of classes to classify images into

  • nb_box (int) – number of anchors boxes to use

  • grid_size (tuple) – YOLO grid size tuple

  • alpha (float) – controls the width of the model

  • dropout (float) – dropout rate

  • weight_quantization (int) –

    sets all weights in the model to have a particular quantization bitwidth except for the weights in the first layer.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • activ_quantization

    sets all activations in the model to have a particular activation quantization bitwidth.

    • ’0’ implements floating point 32-bit activations.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

  • input_weight_quantization

    sets weight quantization in the first layer. Defaults to weight_quantization value.

    • ’0’ implements floating point 32-bit weights.

    • ’2’ through ‘8’ implements n-bit weights where n is from 2-8 bits.

Returns

a Keras Model instance.

Return type

tf.keras.Model

akida_models.yolo_widerface_pretrained()

Helper method to retrieve a yolo_base model that was trained on WiderFace dataset and the anchors that are needed to interpet the model output.

Returns

a Keras Model instance and a list of anchors.

Return type

tf.keras.Model, list

akida_models.yolo_voc_pretrained()

Helper method to retrieve a yolo_base model that was trained on PASCAL VOC2012 dataset for ‘person’ and ‘car’ classes only, and the anchors that are needed to interpet the model output.

Returns

a Keras Model instance and a list of anchors.

Return type

tf.keras.Model, list

YOLO Toolkit

Processing
akida_models.detection.processing.load_image(image_path)

Loads an image from a path.

Parameters

image_path (string) – full path of the image to load

Returns

a Tensorflow image Tensor

akida_models.detection.processing.preprocess_image(image_buffer, output_size, normalize=True)

Preprocess an image for YOLO inference.

Parameters
  • image_buffer (tf.Tensor) – image to preprocess

  • output_size (tuple) – shape of the image after preprocessing

Returns

A resized and normalized image as a Numpy array.

akida_models.detection.processing.decode_output(output, anchors, nb_classes, obj_threshold=0.5, nms_threshold=0.5)

Decodes a YOLO model output.

Parameters
  • output (tf.Tensor) – model output to decode

  • anchors (list) – list of anchors boxes

  • nb_classes (int) – number of classes

  • obj_threshold (float) – confidence threshold for a box

  • nms_threshold (float) – non-maximal supression threshold

Returns

List of BoundingBox objects

akida_models.detection.processing.parse_voc_annotations(gt_folder, image_folder, file_path, labels)

Loads PASCAL-VOC data.

Data is loaded using the groundtruth informations and stored in a dictionary.

Parameters
  • gt_folder (str) – path to the folder containing ground truth files

  • image_folder (str) – path to the folder containing the images

  • file_path (str) – file containing the list of files to parse

  • labels (list) – list of labels of interest

Returns

a dictionnary containing all data present in the ground truth file

Return type

dict

akida_models.detection.processing.parse_widerface_annotations(gt_file, image_folder)

Loads WiderFace data.

Data is loaded using the groundtruth informations and stored in a dictionary.

Parameters
  • gt_file (str) – path to the ground truth file

  • image_folder (str) – path to the directory containing the images

Returns

a dictionnary containing all data present in the ground truth file

Return type

dict

class akida_models.detection.processing.BoundingBox(x1, y1, x2, y2, score=- 1, classes=None)

Utility class to represent a bounding box.

The box is defined by its top left corner (x1, y1), bottom right corner (x2, y2), label, score and classes.

Methods:

get_label()

Returns the label for this bounding box.

get_score()

Returns the score for this bounding box.

iou(other)

Computes intersection over union ratio between this bounding box and another one.

get_label()

Returns the label for this bounding box.

Returns

Index of the label as an integer.

get_score()

Returns the score for this bounding box.

Returns

Confidence as a float.

iou(other)

Computes intersection over union ratio between this bounding box and another one.

Parameters

other (BoundingBox) – the other bounding box for IOU computation

Returns

IOU value as a float

Performances
class akida_models.detection.map_evaluation.MapEvaluation(model, val_data, labels, anchors, period=1, obj_threshold=0.5, nms_threshold=0.5, max_box_per_image=10, is_keras_model=True)

Evaluate a given dataset using a given model. Code originally from https://github.com/fizyr/keras-retinanet.

Parameters
  • model (tf.Keras.Model) – model to evaluate.

  • val_data (dict) – dictionary containing validation data as obtained using preprocess_widerface.py module

  • labels (list) – list of labels as strings

  • anchors (list) – list of anchors boxes

  • period (int, optional) – periodicity the precision is printed, defaults to once per epoch.

  • obj_threshold (float, optional) – confidence threshold for a box

  • nms_threshold (float, optional) – non-maximal supression threshold

  • max_box_per_image (int, optional) – maximum number of detections per image

  • is_keras_model (bool, optional) – indicated if the model is a Keras model (True) or an Akida model (False)

Returns

A dict mapping class names to mAP scores.

Methods:

evaluate_map()

Evaluates current mAP score on the model.

on_epoch_end(epoch[, logs])

Called at the end of an epoch.

evaluate_map()

Evaluates current mAP score on the model.

Returns

global mAP score and dictionnary of label and mAP for each class.

Return type

tuple

on_epoch_end(epoch, logs={})

Called at the end of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters
  • epoch – integer, index of epoch.

  • logs – dict, metric results for this training epoch, and for the validation epoch if validation is performed. Validation result keys are prefixed with val_.

Anchors
akida_models.detection.generate_anchors.generate_anchors(annotations_data, num_anchors=5, grid_size=7, 7)

Creates anchors by clustering dimensions of the ground truth boxes from the training dataset.

Parameters
  • annotations_data (dict) – dictionnary of preprocessed VOC data

  • num_anchors (int, optional) – number of anchors

  • grid_size (tuple, optional) – size of the YOLO grid

Returns

the computed anchors

Return type

list

Functions:

convtiny_dvs_gesture([input_shape, classes, …])

Instantiates a CNN for the “IBM DVS Gesture” example.

convtiny_dvs_handy([input_shape, classes, …])

Instantiates a CNN for “Brainchip dvs_handy” example.

ds_cnn_cifar10([input_shape, classes, …])

Instantiates a MobileNet-like model for the “Cifar-10” example.

ds_cnn_cifar10_pretrained()

Helper method to retrieve a ds_cnn_cifar10 model that was trained on CIFAR10 dataset.

ds_cnn_kws([input_shape, classes, …])

Instantiates a MobileNet-like model for the “Keyword Spotting” example.

ds_cnn_kws_pretrained()

Helper method to retrieve a ds_cnn_kws model that was trained on KWS dataset.

mobilenet_cats_vs_dogs_pretrained()

Helper method to retrieve a mobilenet_imagenet model that was trained on Cats vs.Dogs dataset.

mobilenet_edge_imagenet(base_model, classes)

Instantiates a MobileNet-edge architecture.

mobilenet_imagenet([input_shape, alpha, …])

Instantiates the MobileNet architecture.

mobilenet_imagenet_pretrained([alpha])

Helper method to retrieve a mobilenet_imagenet model that was trained on ImageNet dataset.

tse_mlp_cse2018([numerical_columns, …])

Instantiates a model composed of a trainable spike encoder and a multilayer perceptron for the tabular data example on CSE-CIC-IDS-2018 dataset.

tse_mlp_cse2018_pretrained()

Helper method to retrieve a tse_mlp_cse2018 model that was trained on CSE-CIC-IDS-2018 dataset.

vgg_cifar10([input_shape, classes, …])

Instantiates a vgg-like model for the “Cifar-10” example.

vgg_cifar10_pretrained()

Helper method to retrieve a vgg_cifar10 model that was trained on CIFAR10 dataset.

vgg_utk_face([input_shape, …])

Instantiates a VGG-like model for the regression example on age estimation using UTKFace dataset.

vgg_utk_face_pretrained()

Helper method to retrieve a vgg_utk_face model that was trained on UTK Face dataset.

yolo_base([input_shape, classes, nb_box, …])

Instantiates the YOLOv2 architecture.

yolo_voc_pretrained()

Helper method to retrieve a yolo_base model that was trained on PASCAL VOC2012 dataset for ‘person’ and ‘car’ classes only, and the anchors that are needed to interpet the model output.

yolo_widerface_pretrained()

Helper method to retrieve a yolo_base model that was trained on WiderFace dataset and the anchors that are needed to interpet the model output.