Akida models API
Imports models.
Layer blocks
CNN blocks
- akida_models.layer_blocks.conv_block(inputs, filters, kernel_size, pooling=None, post_relu_gap=False, pool_size=(2, 2), add_batchnorm=False, relu_activation='ReLU3.75', **kwargs)[source]
Adds a convolutional layer with optional layers in the following order: max pooling, batch normalization, activation.
- Parameters:
inputs (tf.Tensor) – input tensor of shape (rows, cols, channels)
filters (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).
kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution kernel. Can be a single integer to specify the same value for all spatial dimensions.
pooling (str, optional) – add a pooling layer of type ‘pooling’ among the values ‘max’ or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.
post_relu_gap (bool, optional) – when pooling is ‘global_avg’, indicates if the pooling comes before or after ReLU activation. Defaults to False.
pool_size (int or tuple of 2 integers, optional) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
add_batchnorm (bool, optional) – add a BatchNormalization layer
relu_activation (str, optional) – the ReLU activation to add to the layer in the form ‘ReLUx’ where ‘x’ is the max_value to use. Set to False to disable activation. Defaults to ‘ReLU3.75’.
**kwargs – arguments passed to the keras.Conv2D layer, such as strides, padding, use_bias, weight_regularizer, etc.
- Returns:
output tensor of conv2D block.
- Return type:
tf.Tensor
- akida_models.layer_blocks.separable_conv_block(inputs, filters, kernel_size, strides=1, padding='same', use_bias=True, pooling=None, post_relu_gap=False, pool_size=(2, 2), add_batchnorm=False, relu_activation='ReLU3.75', fused=True, name=None, kernel_initializer='glorot_uniform', pointwise_regularizer=None)[source]
Adds a separable convolutional layer with optional layers in the following order: global average pooling, max pooling, batch normalization, activation.
- Parameters:
inputs (tf.Tensor) – input tensor of shape (height, width, channels)
filters (int) – the dimensionality of the output space (i.e. the number of output filters in the pointwise convolution).
kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides (int or tuple of 2 integers, optional) – strides of the depthwise convolution. Defaults to 1.
padding (str, optional) – padding mode for the depthwise convolution. Defaults to ‘same’.
use_bias (bool, optional) – whether the layer uses a bias vector. Defaults to True.
pooling (str, optional) – add a pooling layer of type ‘pooling’ among the values ‘max’, or ‘global_avg’, with pooling size set to pool_size. If ‘None’, no pooling will be added.
post_relu_gap (bool, optional) – when pooling is ‘global_avg’, indicates if the pooling comes before or after ReLU activation. Defaults to False.
pool_size (int or tuple of 2 integers, optional) – factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
add_batchnorm (bool, optional) – add a BatchNormalization layer
relu_activation (str, optional) – the ReLU activation to add to the layer in the form ‘ReLUx’ where ‘x’ is the max_value to use. Set to False to disable activation. Defaults to ‘ReLU3.75’.
fused (bool, optional) – If True use a SeparableConv2D layer otherwise use a DepthwiseConv2D + Conv2D layers. Defaults to True.
name (str, optional) – name of the layer. Defaults to None.
kernel_initializer (keras.initializer, optional) – initializer for both kernels. Defaults to ‘glorot_uniform’.
pointwise_regularizer (keras.regularizers, optional) – regularizer function applied to the pointwise kernel matrix. Defaults to None.
- Returns:
output tensor of separable conv block.
- Return type:
tf.Tensor
- akida_models.layer_blocks.dense_block(inputs, units, add_batchnorm=False, relu_activation='ReLU3.75', **kwargs)[source]
Adds a dense layer with optional layers in the following order: batch normalization, activation.
- Parameters:
inputs (tf.Tensor) – Input tensor of shape (rows, cols, channels)
units (int) – dimensionality of the output space
add_batchnorm (bool, optional) – add a BatchNormalization layer
relu_activation (str, optional) – the ReLU activation to add to the layer in the form ‘ReLUx’ where ‘x’ is the max_value to use. Set to False to disable activation. Defaults to ‘ReLU3.75’.
**kwargs – arguments passed to the Dense layer, such as use_bias, kernel_initializer, weight_regularizer, etc.
- Returns:
output tensor of the dense block.
- Return type:
tf.Tensor
Transformers blocks
- akida_models.layer_blocks.mlp_block(inputs, mlp_dim, dropout, name, mlp_act='GeLU')[source]
MLP block definition.
- Parameters:
inputs (tf.Tensor) – inputs
mlp_dim (int) – number of units in the first dense layer
dropout (float) – dropout rate
name (str) – used as a base name for the layers in the block
mlp_act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation. Defaults to “GeLU”.
- Returns:
MLP block outputs
- Return type:
tf.Tensor
- akida_models.layer_blocks.multi_head_attention(x, num_heads, hidden_size, name, softmax='softmax')[source]
Multi-head attention block definition.
- Parameters:
x (tf.Tensor) – inputs
num_heads (int) – the number of attention heads
hidden_size (int) – query, key and value dense layers representation size (units)
name (str) – used as a base name for the layers in the block
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 activation. Defaults to ‘softmax’.
- Raises:
ValueError – if hidden_size is not a multiple of num_heads
- Returns:
block outputs and attention softmaxed scores
- Return type:
(tf.Tensor, tf.Tensor)
- akida_models.layer_blocks.transformer_block(inputs, num_heads, hidden_size, mlp_dim, dropout, name, norm='LN', softmax='softmax', mlp_act='GeLU')[source]
Transformer block definition.
- Parameters:
inputs (tf.Tensor) – inputs
num_heads (int) – the number of attention heads
hidden_size (int) – multi-head attention block internal size
mlp_dim (int) – MLP block internal size
dropout (float) – dropout rate
name (str) – used as a base name for the layers in the block
norm (str, optional) – string that values in [‘LN’, ‘GN1’, ‘BN’, ‘LMN’] and that allows to choose from LayerNormalization, GroupNormalization(groups=1, …), BatchNormalization or LayerMadNormalization layers respectively in the block. Defaults to ‘LN’.
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 activation in attention. Defaults to ‘softmax’.
mlp_act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation in the MLP block. Defaults to “GeLU”.
- Returns:
block outputs and (attention softmaxed scores, the normalized sum of inputs and attention outputs)
- Return type:
(tf.Tensor, (tf.Tensor, tf.Tensor))
Transposed blocks
- akida_models.layer_blocks.conv_transpose_block(inputs, filters, kernel_size, add_batchnorm=False, relu_activation='ReLU8', **kwargs)[source]
Adds a transposed convolutional layer with optional layers in the following order: batch normalization, activation.
- Parameters:
inputs (tf.Tensor) – input tensor of shape (rows, cols, channels)
filters (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).
kernel_size (int or tuple of 2 integers) – specifying the height and width of the 2D convolution kernel. Can be a single integer to specify the same value for all spatial dimensions.
add_batchnorm (bool, optional) – add a BatchNormalization layer. Defaults to False.
relu_activation (str, optional) – the ReLU activation to add to the layer in the form ‘ReLUx’ where ‘x’ is the max_value to use. Set to False to disable activation. Defaults to ‘ReLU3.75’.
**kwargs – arguments passed to the keras.Conv2DTranspose layer, such as strides, padding, use_bias, weight_regularizer, etc.
- Returns:
output tensor of transposed convolution block.
- Return type:
tf.Tensor
- akida_models.layer_blocks.sepconv_transpose_block(inputs, filters, kernel_size, strides=2, padding='same', use_bias=True, add_batchnorm=False, relu_activation='ReLU3.75', name=None, kernel_initializer='glorot_uniform', pointwise_regularizer=None)[source]
Adds a transposed separable convolutional layer with optional layers in the following order: batch normalization, activation.
The separable operation is made of a DepthwiseConv2DTranspose followed by a pointwise Conv2D.
- Parameters:
inputs (tf.Tensor) – input tensor of shape (rows, cols, channels)
filters (int) – the dimensionality of the output space (i.e. the number of output filters in the pointwise convolution).
kernel_size (int or tuple of 2 integers) – specifying the height and width of the depthwise transpose kernel. Can be a single integer to specify the same value for all spatial dimensions.
strides (int or tuple of 2 integers, optional) – strides of the transposed depthwise. Defaults to 2.
padding (str, optional) – padding mode for the transposed depthwise. Defaults to ‘same’.
use_bias (bool, optional) – whether the layer uses a bias vectors. Defaults to True.
add_batchnorm (bool, optional) – add a BatchNormalization layer. Defaults to False.
relu_activation (str, optional) – the ReLU activation to add to the layer in the form ‘ReLUx’ where ‘x’ is the max_value to use. Set to False to disable activation. Defaults to ‘ReLU3.75’.
name (str, optional) – name of the layer. Defaults to None.
kernel_initializer (keras.initializer, optional) – initializer for both kernels. Defaults to ‘glorot_uniform’.
pointwise_regularizer (keras.regularizers, optional) – regularizer function applied to the pointwise kernel matrix. Defaults to None.
- Returns:
output tensor of transposed separable convolution block.
- Return type:
tf.Tensor
Detection block
- akida_models.layer_blocks.yolo_head_block(x, num_boxes, classes, filters=1024)[source]
Adds the YOLOv2 detection head, at the output of a model.
- Parameters:
x (
tf.Tensor
) – input tensor of shape (rows, cols, channels).num_boxes (int) – number of boxes.
classes (int) – number of classes.
filters (int, optional) – number of filters in hidden layers. Defaults to 1024.
- Returns:
output tensor of yolo detection head block.
- Return type:
tf.Tensor
Notes
This block replaces conv layers by separable_conv, to decrease the amount of parameters.
Helpers
Gamma constraint
- akida_models.gamma_constraint.add_gamma_constraint(model)[source]
Method helper to add a MinValueConstraint to an existing model so that gamma values of its BatchNormalization layers are above a defined minimum.
This is typically used to help having a model that will be Akida compatible after conversion. In some cases, the mapping on hardware will fail because of huge values for threshold or act_step with a message indicating that a value cannot fit in a 20 bit signed or unsigned integer. In such a case, this helper can be called to apply a constraint that can fix the issue.
Note that in order for the constraint to be applied to the actual weights, some training must be done: for an already trained model, it can be on a few batches, one epoch or more depending on the impact the constraint has on accuracy. This helper can also be called to a new model that has not been trained yet.
- Parameters:
model (keras.Model) – the model for which gamma constraints will be added.
- Returns:
the same model with BatchNormalisation layers updated.
- Return type:
keras.Model
Unfusing SeparableConvolutional
- akida_models.unfuse_sepconv_layers.unfuse_sepconv2d(model)[source]
Unfuse the SeparableConv2D layers of a model by replacing them with an equivalent DepthwiseConv2D + (pointwise)Conv2D layers.
- Parameters:
model (keras.Model) – the model to update
- Returns:
the original model or a new model with unfused SeparableConv2D layers
- Return type:
keras.Model
Extract samples
- akida_models.extract.extract_samples(out_file, dataset, nb_samples=1024, dtype='uint8')[source]
Extracts samples from dataset and save them to a npz file.
- Parameters:
out_file (str) – name of output file
dataset (numpy.ndarray or tf.data.Dataset) – dataset for extract samples
nb_samples (int, optional) – number of samples. Defaults to 1024.
dtype (str or np.dtype, optional) – the dtype to cast the samples. Defaults to “uint8”.
Knowledge distillation
- class akida_models.distiller.Distiller(*args, **kwargs)[source]
The class that will be used to train the student model using the distillation knowledge method.
Reference Hinton et al. (2015).
- Parameters:
student (keras.Model) – the student model
teacher (keras.Model) – the well trained teacher model
alpha (float, optional) – weight to student_loss_fn and 1-alpha to distillation_loss_fn. Defaults to 0.1
- class akida_models.distiller.DeitDistiller(*args, **kwargs)[source]
Distiller class to train the student model using the Knowledge Distillation (KD) method, found on https://arxiv.org/pdf/2012.12877.pdf
The main difference with the classic KD is that the student has to produce two potential classification outputs. This type of training is based on the assumption that each output has sufficiently interacted with the whole model, therefore the main architecture can be trained through two different sources, as follows:
>>> output, output_kd = student(input) >>> output_tc = teacher(input) >>> student_loss = student_loss_fn(y_true, output) >>> distillation_loss = distillation_loss_fn(output_tc, output_kd)
This means we will expect to have different inputs for each loss, unlike classical KD, where the student’s prediction is shared for both losses. However, given that each classifier has interacted with the student model, the gradient of each loss will contribute to the update of the model weights according to the alpha percentage.
- Parameters:
student (keras.Model) – the student model
teacher (keras.Model) – the well trained teacher model
alpha (float, optional) – weight to student_loss_fn and 1-alpha to distillation_loss_fn. Defaults to 0.1
temperature (float, optional) – if
distiller_type
when compile is equal to ‘soft’, this value will be used as temperature parameter of KLDistillationLoss. Defaults to 1.0.
- akida_models.distiller.KLDistillationLoss(temperature=3)[source]
The KLDistillationLoss is a simple wrapper around the KLDivergence loss that accepts raw predictions instead of probability distributions.
Before invoking the KLDivergence loss, it converts the inputs predictions to probabilities by dividing them by a constant ‘temperature’ and applies a softmax.
- Parameters:
temperature (float) – temperature for softening probability distributions. Larger temperature gives softer distributions.
MACS
- akida_models.macs.get_flops(model)[source]
Calculate FLOPS for a tf.keras.Model or tf.keras.Sequential model in inference mode.
It uses tf.compat.v1.profiler under the hood.
- Parameters:
model (
keras.Model
) – the model to evaluate- Returns:
object containing the FLOPS
- Return type:
tf.compat.v1.profiler.GraphNodeProto
Model I/O
- akida_models.model_io.load_model(model_path, custom_layers=None, compile_model=True)[source]
Loads an Onnx or Keras or quantized model. An error is raised if the provided model extension is not supported.
- Parameters:
model_path (str) – path of the model to load.
custom_layers (dict, optional) – custom layers to add to the Keras model. Defaults to None.
compile_model (bool, optional) – whether to compile the Keras model. Defaults to True.
- Returns:
Loaded model.
- Return type:
keras.models.Model or onnx.ModelProto or Akida.Model
- Raises:
ValueError – if the model could not be loaded using Keras and ONNX loaders.
- akida_models.model_io.load_weights(model, weights_path)[source]
Loads weights from a npz file and apply it to a model.
Go through the dictionary of weights of the npz file, find the corresponding variable in the model and partially load its weights.
- Parameters:
model (keras.Model) – the model to update
weights_path (str) – the path of the npz file to load
- akida_models.model_io.save_weights(model, weights_path)[source]
Save model weights on an npz file.
Takes a model and save the weights of all its layers into an npz file.
- Parameters:
model (keras.Model) – the model to save its weights
weights_path (str) – the path of the npz file to save
- akida_models.model_io.get_model_path(subdir='', model_name_v1=None, file_hash_v1=None, model_name_v2=None, file_hash_v2=None)[source]
Selects the model file on the server depending on the AkidaVersion.
The model path, model name and its hash depends on the Akida version context.
- Parameters:
subdir (str, optional) – the subdirectory where the model is on the data server. Defaults to “”.
model_name_v1 (str, optional) – the model v1 name. Defaults to None.
file_hash_v1 (str, optional) – the model file v1 hash. Defaults to None.
model_name_v2 (str, optional) – the model v2 name. Defaults to None.
file_hash_v2 (str, optional) – the model file v2 hash. Defaults to None.
- Returns:
the model path, model name and file hash.
- Return type:
str, str, str
Utils
- akida_models.utils.fetch_file(origin, fname=None, file_hash=None, cache_subdir='datasets', extract=False, cache_dir=None)[source]
Downloads a file from a URL if it is not already in the cache.
Reimplements keras.utils.get_file without raising an error when detecting a file_hash mismatch (it will just re-download the model).
- Parameters:
origin (str) – original URL of the file.
fname (str, optional) – name of the file. If an absolute path /path/to/file.txt is specified the file will be saved at that location. If None, the name of the file at origin will be used. Defaults to None.
file_hash (str, optional) – the expected hash string of the file after download. Defaults to None.
cache_subdir (str, optional) – subdirectory under the Keras cache dir where the file is saved. If an absolute path /path/to/folder is specified the file will be saved at that location. Defaults to ‘datasets’.
extract (bool, optional) – True tries extracting the file as an Archive, like tar or zip. Defaults to False.
cache_dir (str, optional) – location to store cached files, when directory does not exist it defaults to /tmp/.keras, when None it defaults to the default directory ~/.keras/. Defaults to None.
- Returns:
path to the downloaded file
- Return type:
str
- akida_models.utils.get_tensorboard_callback(out_dir, histogram_freq=1, prefix='')[source]
Build a Tensorboard call, pointing to the output directory
- Parameters:
out_dir (str) – parent directory of the folder to create
histogram_freq (int, optional) – frequency to export logs. Defaults to 1.
prefix (str, optional) – prefix name. Defaults to ‘’.
- akida_models.utils.get_params_by_version(relu_v2='ReLU3.75')[source]
Provides the layer parameters depending on Akida version
With Akida v1, sepconv are fused, the ReLU max value is 6. With Akida v2, sepconv are unfused, the ReLU max value is “relu_v2” and the ReLU is at the end of the block with GAP.
- Parameters:
relu_v2 (str, optional) – ReLUx string when targetting V2. Defaults to ReLU3.75.
- Returns:
fused, post_relu_gap, relu_activation
- Return type:
bool, bool, str
Model zoo
AkidaNet
ImageNet
- akida_models.akidanet_imagenet(input_shape=None, alpha=1.0, include_top=True, pooling=None, classes=1000, input_scaling=(128, -1))[source]
Instantiates the AkidaNet architecture.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple, optional) – shape tuple. Defaults to None.
alpha (float, optional) –
controls the width of the model. Defaults to 1.0.
If alpha < 1.0, proportionally decreases the number of filters in each layer.
If alpha > 1.0, proportionally increases the number of filters in each layer.
If alpha = 1, default number of filters from the paper are used at each layer.
include_top (bool, optional) – whether to include the fully-connected layer at the top of the model. Defaults to True.
pooling (str, optional) –
optional pooling mode for feature extraction when include_top is False. Defaults to None.
None means that the output of the model will be the 4D tensor output of the last convolutional block.
avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
classes (int, optional) – optional number of classes to classify images into, only to be specified if include_top is True. Defaults to 1000.
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (128, -1). Note that following Akida convention, the scale factor is an integer used as a divisor.
- Returns:
a Keras model for AkidaNet/ImageNet.
- Return type:
keras.Model
- Raises:
ValueError – in case of invalid input shape.
- akida_models.akidanet_imagenet_pretrained(alpha=1.0, quantized=True)[source]
Helper method to retrieve an akidanet_imagenet model that was trained on ImageNet dataset.
- Parameters:
alpha (float, optional) – width of the model, allowed values in [0.25, 0.5, 1]. Defaults to 1.0.
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_edge_imagenet(base_model, classes, base_layer='classifier')[source]
Instantiates an AkidaNet-edge architecture.
- Parameters:
base_model (str/keras.Model) – an akidanet_imagenet base model.
classes (int) – the number of classes for the edge classifier.
base_layer (str, optional) – the last base layer. Defaults to “classifier”.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_edge_imagenet_pretrained(quantized=True)[source]
Helper method to retrieve a akidanet_edge_imagenet model that was trained on ImageNet dataset.
- Parameters:
quantized (bool) – a boolean indicating whether the model should be loaded quantized or not
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet18_imagenet(input_shape=None, include_top=True, pooling=None, classes=1000, depths=(4, 4, 4, 4), dimensions=(64, 128, 256, 512), input_scaling=(128, -1))[source]
Instantiates the AkidaNet18 architecture.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple, optional) – shape tuple. Defaults to None.
include_top (bool, optional) – whether to include the fully-connected layer at the top of the model. Defaults to True.
pooling (str, optional) –
optional pooling mode for feature extraction when include_top is False. Defaults to None.
None means that the output of the model will be the 4D tensor output of the last convolutional block.
avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
classes (int, optional) – optional number of classes to classify images into, only to be specified if include_top is True. Defaults to 1000.
depth (tuple, optional) – number of layers in each stages of the model. The length of the tuple defines the number of stages. Defaults to (4, 4, 4, 4).
dimensions (tuple, optional) – number of filters in each stage on the model. The length of the tuple must be equal to the length of the depth tuple. Defaults to (64, 128, 256, 512).
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (128, -1). Note that following Akida convention, the scale factor is an integer used as a divisor.
- Returns:
a Keras model for AkidaNet/ImageNet.
- Return type:
keras.Model
- Raises:
ValueError – in case of invalid input shape or mismatching depth and dimensions.
- akida_models.akidanet18_imagenet_pretrained(quantized=True)[source]
Helper method to retrieve an akidanet18_imagenet model that was trained on ImageNet dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_faceidentification_pretrained(quantized=True)[source]
Helper method to retrieve an akidanet_imagenet model that was trained on CASIA Webface dataset and that performs face identification.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_faceidentification_edge_pretrained(quantized=True)[source]
Helper method to retrieve an akidanet_edge_imagenet model that was trained on CASIA Webface dataset and that performs face identification.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_plantvillage_pretrained(quantized=True)[source]
Helper method to retrieve an akidanet_imagenet model that was trained on PlantVillage dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akidanet_vww_pretrained(quantized=True)[source]
Helper method to retrieve an akidanet_imagenet model that was trained on VWW dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
Preprocessing
- akida_models.imagenet.get_preprocessed_samples(image_size=224, num_channels=3)[source]
Load and preprocess a 10 ImageNet-like images for testing.
- Parameters:
image_size (int, optional) – The target size for the images. Defaults to 224.
num_channels (int, optional) – The number of channels in the images. Defaults to 3.
- Returns:
- 4D and 1D numpy array of the preprocessed images and their
corresponding labels
- Return type:
x_test, labels_test (tuple)
- akida_models.imagenet.preprocessing.preprocess_image(image, image_size, training=False, data_aug=None)[source]
ImageNet data preprocessing.
Preprocessing includes cropping, and resizing for both training and validation images. Training preprocessing introduces some random distortion of the image to improve accuracy.
- Parameters:
image (tf.Tensor) – input image as a 3-D tensor
image_size (tuple) – desired image size
training (bool, optional) – True for training preprocessing, False for validation and inference. Defaults to False.
data_aug (keras.Sequential, optional) – data augmentation. Defaults to None.
- Returns:
preprocessed image
- Return type:
tensorflow.Tensor
Mobilenet
ImageNet
- akida_models.mobilenet_imagenet(input_shape=None, alpha=1.0, dropout=0.001, include_top=True, pooling=None, classes=1000, use_stride2=True, input_scaling=(128, -1))[source]
Instantiates the MobileNet architecture.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple, optional) – shape tuple. Defaults to None.
alpha (float, optional) –
controls the width of the model. Defaults to 1.0.
If alpha < 1.0, proportionally decreases the number of filters in each layer.
If alpha > 1.0, proportionally increases the number of filters in each layer.
If alpha = 1, default number of filters from the paper are used at each layer.
dropout (float, optional) – dropout rate. Defaults to 1e-3.
include_top (bool, optional) – whether to include the fully-connected layer at the top of the model. Defaults to True.
pooling (str, optional) –
optional pooling mode for feature extraction when include_top is False. Defaults to None.
None means that the output of the model will be the 4D tensor output of the last convolutional block.
avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
classes (int, optional) – optional number of classes to classify images into, only to be specified if include_top is True. Defaults to 1000.
use_stride2 (bool, optional) – replace max pooling operations by stride 2 convolutions in layers separable 2, 4, 6 and 12. Defaults to True.
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (128, -1). Note that following Akida convention, the scale factor is an integer used as a divisor.
- Returns:
a Keras model for MobileNet/ImageNet.
- Return type:
keras.Model
- Raises:
ValueError – in case of invalid input shape.
- akida_models.mobilenet_imagenet_pretrained(alpha=1.0, quantized=True)[source]
Helper method to retrieve a mobilenet_imagenet model that was trained on ImageNet dataset.
- Parameters:
alpha (float) – width of the model.
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
DS-CNN
KWS
- akida_models.ds_cnn_kws(input_shape=(49, 10, 1), classes=33, include_top=True, input_scaling=(255, 0))[source]
Instantiates a MobileNet-like model for the “Keyword Spotting” example.
This model is based on the MobileNet architecture, mainly with fewer layers. The weights and activations are quantized such that it can be converted into an Akida model.
This architecture is originated from https://arxiv.org/pdf/1711.07128.pdf and was created for the “Keyword Spotting” (KWS) or “Speech Commands” dataset.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple, optional) – input shape tuple of the model. Defaults to (49, 10, 1).
classes (int, optional) – optional number of classes to classify words into, only be specified if include_top is True. Defaults to 33.
include_top (bool, optional) – whether to include the classification layer at the top of the model. Defaults to True.
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (255, 0). Note that following Akida convention, the scale factor is an integer used as a divisor.
- Returns:
a Keras model for MobileNet/KWS
- Return type:
keras.Model
- akida_models.ds_cnn_kws_pretrained(quantized=True)[source]
Helper method to retrieve a ds_cnn_kws model that was trained on KWS dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
Preprocessing
- class akida_models.kws.preprocessing.AudioProcessor(sample_rate, clip_duration_ms, window_size_ms, window_stride_ms, feature_bin_count, data_url=None, data_dir=None, silence_percentage=0, unknown_percentage=0, wanted_words=None, validation_percentage=0, testing_percentage=0)[source]
Handles loading, partitioning, and preparing audio training data.
Methods:
get_augmented_data_for_wav
(wav_filename, ...)Applies the feature transformation process to a wav audio file, adding data augmentation (background noise and time shifting).
get_data
(how_many, offset, ...)Gather samples from the data set, applying transformations as needed.
get_features_for_wav
(wav_filename)Applies the feature transformation process to the input_wav.
maybe_download_and_extract_dataset
(data_url, ...)Download and extract data set tar file.
Searches a folder for background noise audio, and loads it into memory.
prepare_data_index
(silence_percentage, ...)Prepares a list of the samples organized by set and label.
Builds a TensorFlow graph to apply the input distortions.
- get_augmented_data_for_wav(wav_filename, background_frequency, background_volume_range, time_shift, num_augmented_samples=1)[source]
Applies the feature transformation process to a wav audio file, adding data augmentation (background noise and time shifting).
- Parameters:
wav_filename (str) – The path to the input audio file.
background_frequency – How many clips will have background noise, 0.0 to 1.0.
background_volume_range – How loud the background noise will be.
time_shift – How much to randomly shift the clips by in time.
num_augmented_samples – How many samples will be generated using data augmentation.
- Returns:
Numpy data array containing the generated features for every augmented sample.
- get_data(how_many, offset, background_frequency, background_volume_range, time_shift, mode)[source]
Gather samples from the data set, applying transformations as needed.
When the mode is ‘training’, a random selection of samples will be returned, otherwise the first N clips in the partition will be used. This ensures that validation always uses the same samples, reducing noise in the metrics.
- Parameters:
how_many – Desired number of samples to return. -1 means the entire contents of this partition.
offset – Where to start when fetching deterministically.
background_frequency – How many clips will have background noise, 0.0 to 1.0.
background_volume_range – How loud the background noise will be.
time_shift – How much to randomly shift the clips by in time.
mode – Which partition to use, must be ‘training’, ‘validation’, or ‘testing’.
- Returns:
List of sample data for the transformed samples, and list of label indexes
- Raises:
ValueError – If background samples are too short.
- get_features_for_wav(wav_filename)[source]
Applies the feature transformation process to the input_wav.
Runs the feature generation process (generally producing a spectrogram from the input samples) on the WAV file. This can be useful for testing and verifying implementations being run on other platforms.
- Parameters:
wav_filename – The path to the input audio file.
- Returns:
Numpy data array containing the generated features.
- static maybe_download_and_extract_dataset(data_url, dest_directory)[source]
Download and extract data set tar file.
If the data set we’re using doesn’t already exist, this function downloads it from the TensorFlow.org website and unpacks it into a directory. If the data_url is none, don’t download anything and expect the data directory to contain the correct files already.
- Parameters:
data_url – Web location of the tar file containing the data set.
dest_directory – File path to extract data to.
- prepare_background_data()[source]
Searches a folder for background noise audio, and loads it into memory.
It’s expected that the background audio samples will be in a subdirectory named ‘_background_noise_’ inside the ‘data_dir’ folder, as .wavs that match the sample rate of the training data, but can be much longer in duration.
If the ‘_background_noise_’ folder doesn’t exist at all, this isn’t an error, it’s just taken to mean that no background noise augmentation should be used. If the folder does exist, but it’s empty, that’s treated as an error.
- Returns:
List of raw PCM-encoded audio samples of background noise.
- Raises:
Exception – If files aren’t found in the folder.
- prepare_data_index(silence_percentage, unknown_percentage, wanted_words, validation_percentage, testing_percentage)[source]
Prepares a list of the samples organized by set and label.
The training loop needs a list of all the available data, organized by which partition it should belong to, and with ground truth labels attached. This function analyzes the folders below the data_dir, figures out the right labels for each file based on the name of the subdirectory it belongs to, and uses a stable hash to assign it to a data set partition.
- Parameters:
silence_percentage – How much of the resulting data should be background.
unknown_percentage – How much should be audio outside the wanted classes.
wanted_words – Labels of the classes we want to be able to recognize.
validation_percentage – How much of the data set to use for validation.
testing_percentage – How much of the data set to use for testing.
- Returns:
Dictionary containing a list of file information for each set partition, and a lookup map for each class to determine its numeric index.
- Raises:
Exception – If expected files are not found.
VGG
UTK Face
- akida_models.vgg_utk_face(input_shape=(32, 32, 3), input_scaling=(127, -1))[source]
Instantiates a VGG-like model for the regression example on age estimation using UTKFace dataset.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple, optional) – input shape tuple of the model. Defaults to (32, 32, 3).
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (127, -1). Note that following Akida convention, the scale factor is an integer used as a divisor.
- Returns:
a Keras model for VGG/UTKFace
- Return type:
keras.Model
- akida_models.vgg_utk_face_pretrained(quantized=True)[source]
Helper method to retrieve a vgg_utk_face model that was trained on UTK Face dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
Preprocessing
YOLO
- akida_models.yolo_base(input_shape=(224, 224, 3), classes=1, nb_box=5, alpha=1.0, input_scaling=(127.5, -1))[source]
Instantiates the YOLOv2 architecture.
- Parameters:
input_shape (tuple, optional) – input shape tuple. Defaults to (224, 224, 3).
classes (int, optional) – number of classes to classify images into. Defaults to 1.
nb_box (int, optional) – number of anchors boxes to use. Defaults to 5.
alpha (float, optional) – controls the width of the model. Defaults to 1.0.
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (127.5, -1). Note that following Akida convention, the scale factor is a number used as a divisor.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.yolo_widerface_pretrained(quantized=True)[source]
Helper method to retrieve a yolo_base model that was trained on WiderFace dataset and the anchors that are needed to interpet the model output.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance and a list of anchors.
- Return type:
keras.Model, list
- akida_models.yolo_voc_pretrained(quantized=True)[source]
Helper method to retrieve a yolo_base model that was trained on PASCAL VOC2012 dataset for ‘person’ and ‘car’ classes only, and the anchors that are needed to interpet the model output.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance and a list of anchors.
- Return type:
keras.Model, list
Detection data
- akida_models.detection.data.get_detection_datasets(data_path, dataset_name, full_set=True)[source]
Loads VOC, Widerface or COCO data.
- Parameters:
data_path (str) – path to the folder containing tfrecords files for VOC, Widerface or COCO data.
dataset_name (str) – Name of the dataset. Choices in [coco, voc, widerface].
full_set (bool, optional) – When dataset is ‘voc’, set to False to limit to ‘car’ and ‘person’ labels. Defaults to True.
- Returns:
train and validation data, labels, sizes of train and validation data.
- Return type:
tf.dataset, tf.dataset, list, int, int
- akida_models.detection.voc.data.get_voc_dataset(data_path, labels=['car', 'person'], training=False)[source]
Loads voc dataset and builds a tf.dataset out of it.
- Parameters:
data_path (str) – path to the folder containing voc tar files
labels (list[str], optional) – list of labels of interest as strings. Defaults to [“car”, “person”].
training (bool, optional) – True to retrieve training data, False for validation. Defaults to False.
- Returns:
the requested dataset (train or validation), the list of labels and the dataset size.
- Return type:
tf.dataset, labels (list[str]), int
- akida_models.detection.coco.data.get_coco_dataset(data_path, training=False)[source]
Loads coco dataset and builds a tf.dataset out of it.
- Parameters:
data_path (str) – path to the folder containing coco tfrecords.
training (bool, optional) – True to retrieve training data, False for validation. Defaults to False.
- Returns:
the requested dataset (train or validation), labels and the dataset size.
- Return type:
tf.dataset, list, int
- akida_models.detection.widerface.data.get_widerface_dataset(data_path, training=False)[source]
Loads wider_face dataset and builds a tf.dataset out of it.
- Parameters:
data_path (str) – path to the folder containing widerface tfrecords.
training (bool, optional) – True to retrieve training data, False for validation. Defaults to False.
- Returns:
the requested dataset (train or validation) and the dataset size.
- Return type:
tf.dataset, int
Preprocessing
- akida_models.detection.preprocess_data.preprocess_dataset(dataset, input_shape, grid_size, labels, batch_size, aug_pipe, create_targets_fn, training=True, preserve_aspect_ratio=False, *args, **kwargs)[source]
Preprocesses the input dataset by applying the necessary image and label transformations.
- Parameters:
dataset (tf.data.Dataset) – The input dataset.
input_shape (tuple) – The desired input shape for the image.
grid_size (tuple) – The grid size used for YOLO target generation.
labels (list[str]) – List of class labels.
batch_size (int) – Batch size for the preprocessed dataset.
aug_pipe (iaa.Augmenter) – The augmentation pipeline.
create_targets_fn (callable) – Function for creating target labels. It should accept the following parameters: objects, grid_size, num_classes and others arguments such as anchors.
training (bool, optional) – Flag indicating whether the dataset is for training or not. Defaults to True.
preserve_aspect_ratio (bool, optional) – Whether aspect ratio is preserved during resizing. Defaults to False.
- Returns:
The preprocessed dataset.
- Return type:
dataset (tf.data.Dataset)
Utils
- akida_models.detection.data_utils.remove_empty_objects(sample)[source]
Remove samples with empty objects.
- Parameters:
sample (dict) – A dictionary representing a sample with object information. {‘image’, ‘objects’: {‘bbox’, ‘label’}}.
- Returns:
A boolean tensor indicating whether the sample has non-empty objects.
- Return type:
tf.Tensor
- akida_models.detection.data_utils.get_dataset_length(dataset)[source]
Get the length of a TF dataset.
- Parameters:
dataset (tf.data.Dataset) – A TF dataset containing elements.
- Returns:
The number of elements in the dataset.
- Return type:
int
- class akida_models.detection.data_utils.Coord[source]
Static class representing bounding box coordinates.
These values align with the TensorFlow Datasets (tfds) bounding box format. In tfds, the “bbox” feature is formatted as : tfds.features.BBox(ymin / height, xmin / width, ymax / height, xmax / width).
YOLO Toolkit
Processing
- akida_models.detection.processing.load_image(image_path)[source]
Loads an image from a path.
- Parameters:
image_path (string) – full path of the image to load
- Returns:
a Tensorflow image Tensor
- akida_models.detection.processing.preprocess_image(image, input_shape, affine_transform=None)[source]
Resize an image to the specified dimensions using either a normal resize or an affine transformation in order to preserve aspect ratio.
- Parameters:
image (np.ndarray) – input image with size represented as (h, w, c).
input_shape (tuple) – tuple containing desired image dimension in form of (h, w, c).
affine_transform (np.ndarray, optional) – A 2x3 affine transformation matrix. Defaults to None.
- Returns:
the resized image.
- Return type:
np.ndarray
- akida_models.detection.processing.get_affine_transform(source_point, source_size, dest_size, inverse=False)[source]
Construct an affine transformation matrix to map between source and destination sizes. Note that to construct an affine transformation we need three points.
- Parameters:
source_point (np.ndarray) – A point in the source image to be mapped, usually the center.
source_size (tuple) – The size of the source image in the form (width, height).
dest_size (tuple) – The desired image size in the form (width, height).
inverse (bool, optional) – If True, compute the inverse affine transformation. Defaults to False.
- Returns:
A 2x3 affine transformation matrix.
- Return type:
np.ndarray
- akida_models.detection.processing.apply_affine_transform_to_bboxes(bboxes, affine_transform)[source]
Apply an affine transformation to multiple bounding boxes.
- Parameters:
bboxes (np.ndarray) – A numpy array of shape (N, 4) representing N bounding boxes, each defined by the coordinates [y1, x1, y2, x2].
affine_transform (np.ndarray) – A 2x3 affine transformation matrix.
- Returns:
A numpy array of shape (N, 4) with the transformed bounding boxes, in the format of [y1, x1, y2, x2].
- Return type:
np.ndarray
- akida_models.detection.processing.desize_bboxes(bboxes, scores, raw_height, raw_width, input_height, input_width, preserve_aspect_ratio)[source]
Reverse the resizing of bounding boxes to match the original image dimensions.
This operation must be the inverse of the resizing applied during preprocessing in the validation or testing pipelines. The version defined here is for an aspect-ratio conserving resize, scaled by the longest side.
- Parameters:
bboxes (np.ndarray) – A numpy array of shape (N, 4) representing N bounding boxes, each defined by the coordinates [y1, x1, y2, x2].
raw_height (int) – The original height of the image.
raw_width (int) – The original width of the image.
input_height (int) – The height of the resized image used during processing.
input_width (int) – The width of the resized image used during processing.
preserve_aspect_ratio (bool) – Whether aspect ratio is preserved during resizing or not.
- Returns:
A numpy array of shape (N, 4) with the bounding boxes resized to match the original image dimensions, each defined by the coordinates [y1, x1, y2, x2].
- Return type:
np.ndarray
- akida_models.detection.processing.decode_output(output, anchors, nb_classes, obj_threshold=0.5, nms_threshold=0.5)[source]
Decodes a YOLO model output.
- Parameters:
output (tf.Tensor) – model output to decode
anchors (list) – list of anchors boxes
nb_classes (int) – number of classes
obj_threshold (float, optional) – confidence threshold for a box. Defaults to 0.5.
nms_threshold (float, optional) – non-maximal supression threshold. Defaults to 0.5.
- Returns:
List of BoundingBox objects
- akida_models.detection.processing.create_yolo_targets(objects, grid_size, num_classes, anchors)[source]
Creates YOLO-style targets tensor for the given objects.
- Parameters:
objects (dict) – Dictionary containing information about objects in the image, including labels and bounding boxes.
grid_size (tuple) – The grid size used for YOLO target generation.
num_classes (int) – The number of classes.
anchors (list) – List of anchor boxes.
- Returns:
The targets output tensor.
- Return type:
targets (tf.Tensor)
- class akida_models.detection.processing.BoundingBox(x1, y1, x2, y2, score=-1, classes=None)[source]
Utility class to represent a bounding box.
The box is defined by its top left corner (x1, y1), bottom right corner (x2, y2), label, score and classes.
Methods:
Returns the label for this bounding box.
Returns the score for this bounding box.
iou
(other)Computes intersection over union ratio between this bounding box and another one.
- get_label()[source]
Returns the label for this bounding box.
- Returns:
Index of the label as an integer.
- iou(other)[source]
Computes intersection over union ratio between this bounding box and another one.
- Parameters:
other (BoundingBox) – the other bounding box for IOU computation
- Returns:
IOU value as a float
YOLO Data Augmentation
- akida_models.detection.data_augmentation.augment_sample(image, objects, aug_pipe, labels, flip, scale, offx, offy)[source]
Applies data augmentation to an image and its associated objects.
- Parameters:
image (np.ndarray) – the input image as a NumPy array.
objects (dict) – dictionary containing information about objects in the image, including labels and bounding boxes.
aug_pipe (iaa.Augmenter) – the augmentation pipeline.
labels (list) – list of labels of interest.
flip (bool) – binary value indicating whether to flip the image or not.
scale (float) – scaling factor for the image.
offx (int) – horizontal translation offset for the image.
offy (int) – vertical translation offset for the image.
- Returns:
augmented image and objects.
- Return type:
np.ndarray, dict
Performance
- class akida_models.detection.map_evaluation.MapEvaluation(model, val_data, num_valid, labels, anchors, period=1, obj_threshold=0.5, nms_threshold=0.5, max_box_per_image=10, preserve_aspect_ratio=False, is_keras_model=True, decode_output_fn=<function decode_output>)[source]
Evaluate a given dataset using a given model. Code originally from https://github.com/fizyr/keras-retinanet. Note that mAP is computed for IoU thresholds from 0.5 to 0.95 with a step size of 0.05.
- Parameters:
model (keras.Model) – model to evaluate.
val_data (dict) – dictionary containing validation data as obtained using preprocess_widerface.py module
num_valid (int) – the length of the validation dataset
labels (list) – list of labels as strings
anchors (list) – list of anchors boxes
period (int, optional) – periodicity the precision is printed, defaults to once per epoch. Defaults to 1.
obj_threshold (float, optional) – confidence threshold for a box. Defaults to 0.5.
nms_threshold (float, optional) – non-maximal supression threshold. Defaults to 0.5.
max_box_per_image (int, optional) – maximum number of detections per image, Defaults to 10.
preserve_aspect_ratio (bool, optional) – Whether aspect ratio is preserved during resizing. Defaults to False.
is_keras_model (bool, optional) – indicated if the model is a Keras model (True) or an Akida model (False). Defaults to True.
decode_output_fn (Callable, optional) – function to decode model’s outputs. Defaults to
decode_output()
(yolo decode output function).
- Returns:
A dict mapping class names to mAP scores.
Methods:
Evaluates current mAP score on the model.
on_epoch_end
(epoch[, logs])Keras callback called at the end of an epoch.
- evaluate_map()[source]
Evaluates current mAP score on the model. mAP is computed for IoU thresholds from 0.5 to 0.95 with a step size of 0.05
- Returns:
a dictionnary containing mAP for each threshold and a dictionnary of label containing mAP for each class.
- Return type:
tuple
- on_epoch_end(epoch, logs=None)[source]
Keras callback called at the end of an epoch.
- Parameters:
epoch (int) – index of epoch.
logs (dict, optional) – metric results for this training epoch, and for the validation epoch if validation is performed. Validation result keys are prefixed with val. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘acc’: 0.7}. Defaults to None.
Anchors
- akida_models.detection.generate_anchors.generate_anchors(dataset, num_anchors=5, grid_size=(7, 7))[source]
Creates anchors by clustering dimensions of the ground truth boxes from the training dataset.
- Parameters:
dataset (tf.Dataset) – dataset used to generate anchors
num_anchors (int, optional) – number of anchors
grid_size (tuple, optional) – size of the YOLO grid
- Returns:
the computed anchors
- Return type:
list
Utils
- akida_models.detection.box_utils.xywh_to_xyxy(boxes)[source]
Convert a set of boxes from format xywh to xyxy, where each format represent:
‘xywh’: format of (‘cx’, ‘xy’, ‘w’, ‘h’), also called ‘centroids’ and
‘xyxy’: format of (‘x_min’, ‘y_min’, ‘x_max’, ‘y_max’), also called ‘corners’.
- Parameters:
boxes (tf.Tensor or np.ndarray) – tensor with shape (N, 4)
- Returns:
tensor with new format
- Return type:
tf.Tensor or np.ndarray
- akida_models.detection.box_utils.xyxy_to_xywh(boxes)[source]
Convert a set of boxes from format xyxy to xywh, where each format represent:
‘xyxy’: format of (‘x_min’, ‘y_min’, ‘x_max’, ‘y_max’), also called ‘corners’ and
‘xywh’: format of (‘cx’, ‘xy’, ‘w’, ‘h’), also called ‘centroids’.
- Parameters:
boxes (tf.Tensor) – tensor with shape (N, 4)
- Returns:
tensor with new format
- Return type:
tf.Tensor
- akida_models.detection.box_utils.compute_overlap(a1, a2, mode='element_wise', box_format='xywh')[source]
Calculate ious between a1, a2 in two different modes:
element_wise: compute iou element-by-element, returning 1D array tensor,
outer_product: compute cross iou with all possible combination between inputs.
- Parameters:
a1 (tf.Tensor or np.ndarray) – set of boxes, with shape at least equal to (N, 4).
a2 (tf.Tensor or np.ndarray) – set of boxes, with compatible broadcast-shape (in ‘element_wise’ mode) or shape at least equal to (N, 4) (in ‘outer_product’ mode).
mode (str, optional) – the mode to use. ‘element_wise’ or ‘outer_product’. Defaults to “element_wise”.
box_format (str, optional) – format of both inputs. Defaults to ‘xywh’.
- Returns:
IoU between inputs with shape (N,) in ‘element_wise’, otherwise (N, M).
- Return type:
tf.Tensor or np.ndarray
- akida_models.detection.box_utils.compute_center_xy(bbox, grid_size)[source]
Computes the center coordinates (x, y) of a bounding box relative to the grid.
- Parameters:
bbox (tf.Tensor) – Bounding box coordinates (ymin, xmin, ymax, xmax).
grid_size (tuple) – The grid size in the format (h, w).
- Returns:
A tuple containing the center coordinates (center_x, center_y).
- Return type:
tuple
- akida_models.detection.box_utils.compute_center_wh(bbox, grid_size)[source]
Computes the width and height of a bounding box relative to a grid.
- Parameters:
bbox (tf.Tensor) – Bounding box coordinates (ymin, xmin, ymax, xmax).
grid_size (tuple) – The grid size in the format (h, w).
- Returns:
The width and height of the bounding box.
- Return type:
tuple
PointNet++
ModelNet40
- akida_models.pointnet_plus_modelnet40(selected_points=64, features=3, knn_points=32, classes=40, alpha=1.0)[source]
Instantiates a PointNet++ model for the ModelNet40 classification.
This example implements the point cloud deep learning paper PointNet (Qi et al., 2017). For a detailed introduction on PointNet see this blog post.
PointNet++ is conceived as a repeated series of operations: sampling and grouping of points, followed by the trainable convnet itself. Those operations are then repeated at increased scale. Each of the selected points is taken as the centroid of the K-nearest neighbours. This defines a localized group.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
selected_points (int, optional) – the number of points to process per sample. Defaults to 64.
features (int, optional) – the number of features. Expected values are 1 or 3. Default is 3.
knn_points (int, optional) – the number of points to include in each localised group. Must be a power of 2, and ideally an integer square (so 64, or 16 for a deliberately small network, or 256 for large). Defaults to 32.
classes (int, optional) – the number of classes for the classifier. Default is 40.
alpha (float, optional) – network filters multiplier. Default is 1.0.
- Returns:
a quantized Keras model for PointNet++/ModelNet40.
- Return type:
keras.Model
- akida_models.pointnet_plus_modelnet40_pretrained(quantized=True)[source]
Helper method to retrieve a pointnet_plus model that was trained on ModelNet40 dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
Processing
- akida_models.modelnet40.preprocessing.get_modelnet_from_file(num_points, filename='ModelNet40.zip')[source]
Load the ModelNet data from file.
First parse through the ModelNet data folders. Each mesh is loaded and sampled into a point cloud before being added to a standard python list and converted to a numpy array. We also store the current enumerate index value as the object label and use a dictionary to recall this later.
- Parameters:
num_points (int) – number of points with which mesh is sample.
filename (str) – the dataset file to load if the npz file was not generated yet. Defaults to “ModelNet40.zip”.
- Returns:
train set, train labels, test set, test labels as numpy arrays and dict containing class folder name.
- Return type:
np.array, np.array, np.array, np.array, dict
- akida_models.modelnet40.preprocessing.get_modelnet(train_points, train_labels, test_points, test_labels, batch_size, selected_points=64, knn_points=32)[source]
Obtains the ModelNet dataset.
- Parameters:
train_points (numpy.array) – train set.
train_labels (numpy.array) – train labels.
test_points (numpy.array) – test set.
test_labels (numpy.array) – test labels.
batch_size (int) – size of the batch.
selected_points (int) – num points to process per sample. Defaults to 64.
knn_points (int) – number of points to include in each localised group. Must be a power of 2, and ideally an integer square (so 64, or 16 for a deliberately small network, or 256 for large). Defaults to 32.
- Returns:
train and test point with data augmentation.
- Return type:
tf.data.Dataset, tf.data.Dataset
GXNOR
MNIST
- akida_models.gxnor_mnist()[source]
Instantiates a Keras GXNOR model with an additional dense layer to make better classification.
The paper describing the original model can be found here.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Returns:
a Keras model for GXNOR/MNIST
- Return type:
keras.Model
- akida_models.gxnor_mnist_pretrained(quantized=True)[source]
Helper method to retrieve a gxnor_mnist model that was trained on MNIST dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
CenterNet
- akida_models.centernet_base(input_shape=(384, 384, 3), classes=20, input_scaling=(127, -1), separable_cutoff=64)[source]
A Keras Model implementing the CenterNet architecture, on top of an AkidaNet backbone
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (384, 384, 3).
classes (int, optional) – number of output classes. Defaults to 20.
input_scaling (tuple, optional) – input scaling. Defaults to (127, -1).
separable_cutoff (int, optional) – maximum number of filters for standard Conv layers. Layers with more filters than this will be defined as separable Convs. Defaults to 64.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.centernet_voc_pretrained(quantized=True)[source]
Helper method to retrieve an centernet_base model that was trained on VOC detection dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.centernet.centernet_processing.decode_output(output, nb_classes, obj_threshold=0.1, max_detections=100, kernel=5)[source]
Decodes a CenterNet model.
- Parameters:
output (tf.Tensor) – model output to decode.
nb_classes (int) – number of classes.
obj_threshold (float, optional) – confidence threshold for a box. Defaults to 0.1.
max_detection (int, optional) – maximum number of boxes the model is allowed to produce. Defaults to 100.
kernel (int, optional) – max pool kernel size. Defaults to 5.
- Returns:
BoundingBox objects
- Return type:
List
- akida_models.centernet.centernet_utils.create_centernet_targets(objects, grid_size, num_classes)[source]
Creates Centernet-style targets tensor for the given objects.
- Parameters:
objects (dict) – Dictionary containing information about objects in the image, including labels and bounding boxes.
grid_size (tuple) – The grid size used for Centernet target generation.
num_classes (int) – The number of classes.
- Returns:
The targets output tensor.
- Return type:
targets (tf.Tensor)
- akida_models.centernet.centernet_utils.build_centernet_aug_pipeline()[source]
Defines a sequence of augmentation steps for Centernet training that will be applied to every image.
- Returns:
sequence of augmentation.
- Return type:
iaa.Sequential
- class akida_models.centernet.centernet_loss.CenternetLoss(alpha=2.0, gamma=4.0, eps=1e-12, heatmap_loss_weight=1.0, wh_loss_weight=0.1, offset_loss_weight=1.0)[source]
Computes CenterNet loss from a model raw output.
The CenterNet loss computation is from https://arxiv.org/abs/1904.07850.
- Parameters:
alpha (float, optional) – alpha parameter in heatmap loss. Defaults to 2.0.
gamma (float, optional) – gamma parameter in heatmap loss. Defaults to 4.0.
eps (float, optional) – epsilon parameter in heatmap loss. Defaults to 1e-12.
heatmap_loss_weight (float, optional) – heatmap loss weight. Defaults to 1.0.
wh_loss_weight (float, optional) – location loss weight. Defaults to 0.1.
offset_loss_weight (float, optional) – offset loss weight. Defaults to 1.0.
AkidaUNet
- akida_models.akida_unet_portrait128(input_shape=(128, 128, 3), alpha=0.5, input_scaling=(128, -1))[source]
Instantiates an Akida U-Net architecture.
It is composed of an AkidaNet-ImageNet encoder followed by a succession of Conv2DTranspose layers for the decoder part. It does not contain any skip connection (concatenation) between the encoder and the decoder branches.
- Parameters:
input_shape (tuple, optional) – input shape tuple. Defaults to (128, 128, 3).
alpha (float, optional) – controls the width (number of filters) of the model. Defaults to 0.5.
input_scaling (tuple, optional) – scale factor and offset to apply to inputs. Defaults to (128, -1). Note that following Akida convention, the scale factor is a number used as a divisor.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
- akida_models.akida_unet_portrait128_pretrained(quantized=True)[source]
Helper method to retrieve an akida_unet model that was trained on portrait128 dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance.
- Return type:
keras.Model
Transformers
ViT
- akida_models.vit_imagenet(input_shape, patch_size, num_blocks, hidden_size, num_heads, name, mlp_dim, classes=1000, dropout=0.1, include_top=True, norm='LN', last_norm='LN', softmax='softmax', act='GeLU')[source]
Instantiates the ViT architecture.
The Vision Transformer (ViT) is a model for image classification that employs a Transformer-like architecture over patches of the image. An image is split into fixed-size patches, each of them are then linearly embedded, position embeddings are added, and the resulting sequence of vectors are fed to a standard Transformer encoder.
Please refer to https://arxiv.org/abs/2010.11929 for further details.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple) – image shape tuple
patch_size (int) – the size of each patch (must fit evenly in image size)
num_blocks (int) – the number of transformer blocks to use.
hidden_size (int) – the number of filters to use
num_heads (int) – the number of transformer heads
name (str) – the model name
mlp_dim (int) – the number of dimensions for the MLP output in the transformers.
classes (int, optional) – number of classes to classify images into, only to be specified if include_top is True. Defaults to 1000.
dropout (float, optional) – fraction of the units to drop for dense layers. Defaults to 0.1.
include_top (bool, optional) – whether to include the final classifier head. If False, the output will correspond to that of the transformer. Defaults to True.
norm (str, optional) – string that values in [‘LN’, ‘GN1’, ‘BN’, ‘LMN’] and that allows to choose from LayerNormalization, GroupNormalization(groups=1, …), BatchNormalization or LayerMadNormalization layers respectively in the model. Defaults to ‘LN’.
last_norm (str, optional) – string that values in [‘LN’, ‘BN’] and that allows to choose from LayerNormalization or BatchNormalization in the classifier network. Defaults to ‘LN’.
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 in MHA. Defaults to ‘softmax’.
act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation in MLP block. Defaults to ‘GeLU’.
- akida_models.vit_ti16(input_shape=(224, 224, 3), classes=1000, norm='LN', last_norm='LN', softmax='softmax', act='GeLU', include_top=True)[source]
Instantiates the ViT-Tiny 16 architecture; that is a ViT architecture with 3 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
norm (str, optional) – string that values in [‘LN’, ‘GN1’, ‘BN’, ‘LMN’] and that allows to choose from LayerNormalization, GroupNormalization(groups=1, …), BatchNormalization or LayerMadNormalization layers respectively in the model. Defaults to ‘LN’.
last_norm (str, optional) – string that values in [‘LN’, ‘BN’] and that allows to choose from LayerNormalization or BatchNormalization in the classifier network. Defaults to ‘LN’.
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 in attention block. Defaults to ‘softmax’.
act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation inside MLP. Defaults to ‘GeLU’.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.bc_vit_ti16(input_shape=(224, 224, 3), classes=1000, include_top=True, num_blocks=12)[source]
Instantiates the ViT-Tiny 16 architecture adapted for implementation on hardware, that is:
LayerNormalization replaced by LayerMadNormalization,
GeLU replaced by ReLU8 activations,
Softmax replaced by shiftmax.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
num_blocks (int, optional) – the number of transformer blocks to use. Defaults to 12.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.bc_vit_ti16_imagenet_pretrained(quantized=True)[source]
Helper method to retrieve a ViT-Tiny 16 model adapted for implementation on hardware, that is:
LayerNormalization replaced by LayerMadNormalization,
GeLU replaced by ReLU8 activations,
Softmax replaced by shiftmax,
and that was trained on ImageNet dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance
- Return type:
keras.Model
- akida_models.vit_s16(input_shape=(224, 224, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-Small 16 architecture; that is a ViT architecture with 6 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.vit_s32(input_shape=(224, 224, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-Small 32 architecture; that is a ViT architecture with 6 attention heads, 12 blocks and a patch size of 32.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.vit_b16(input_shape=(224, 224, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-B16 architecture; that is a ViT architecture with 12 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.vit_b32(input_shape=(224, 224, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-B32 architecture; that is a ViT architecture with 12 attention heads, 12 blocks and a patch size of 32.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.vit_l16(input_shape=(384, 384, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-L16 architecture; that is a ViT architecture with 16 attention heads, 24 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (384, 384, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.vit_l32(input_shape=(384, 384, 3), classes=1000, include_top=True)[source]
Instantiates the ViT-L32 architecture; that is a ViT architecture with 16 attention heads, 24 blocks and a patch size of 32.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (384, 384, 3).
classes (int, optional) – number of classes. Defaults to 1000.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
DeiT
- akida_models.deit_imagenet(input_shape, num_blocks, hidden_size, num_heads, name, mlp_dim, patch_size=16, classes=1000, dropout=0.1, include_top=True, distilled=False, norm='LN', last_norm='LN', softmax='softmax', act='GeLU')[source]
Instantiates the DeiT architecture.
The Data-efficient image Transformers (DeiT) is a model for image classification, requiring far less data and far less computing resources compared to the original ViT model. It relies on a teacher-student strategy specific to transformers (distillation token).
Please refer to https://arxiv.org/abs/2012.12877 for further details.
Note: input preprocessing is included as part of the model (as a Rescaling layer). This model expects inputs to be float tensors of pixels with values in the [0, 255] range.
- Parameters:
input_shape (tuple) – image shape tuple
num_blocks (int) – the number of transformer blocks to use.
hidden_size (int) – the number of filters to use
num_heads (int) – the number of transformer heads
name (str) – the model name
mlp_dim (int) – the number of dimensions for the MLP output in the transformers.
patch_size (int, optional) – the size of each patch (must fit evenly in image size). Defaults to 16.
classes (int, optional) – number of classes to classify images into, only to be specified if include_top is True. Defaults to 1000.
dropout (float, optional) – fraction of the units to drop for dense layers. Defaults to 0.1.
include_top (bool, optional) – whether to include the final classifier head. If False, the output will correspond to that of the transformer. Defaults to True.
distilled (bool, optional) – Build model append a distilled token. Defaults to False.
norm (str, optional) – string that values in [‘LN’, ‘GN1’, ‘BN’, ‘LMN’] and that allows to choose from LayerNormalization, GroupNormalization(groups=1, …), BatchNormalization or LayerMadNormalization layers respectively in the model. Defaults to ‘LN’.
last_norm (str, optional) – string that values in [‘LN’, ‘BN’] and that allows to choose from LayerNormalization or BatchNormalization in the classifier network. Defaults to ‘LN’.
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 in MHA. Defaults to ‘softmax’.
act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation in MLP block. Defaults to ‘GeLU’.
- akida_models.deit_ti16(input_shape=(224, 224, 3), classes=1000, distilled=False, norm='LN', last_norm='LN', softmax='softmax', act='GeLU', include_top=True)[source]
Instantiates the DeiT-Tiny 16 architecture; that is a DeiT architecture with 3 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
distilled (bool, optional) – build model appending a distilled token. Defaults to False.
norm (str, optional) – string that values in [‘LN’, ‘GN1’, ‘BN’, ‘LMN’] and that allows to choose from LayerNormalization, GroupNormalization(groups=1, …), BatchNormalization or LayerMadNormalization layers respectively in the model. Defaults to ‘LN’.
last_norm (str, optional) – string that values in [‘LN’, ‘BN’] and that allows to choose from LayerNormalization or BatchNormalization in the classifier network. Defaults to ‘LN’.
softmax (str, optional) – string with values in [‘softmax’, ‘softmax2’] that allows to choose between softmax and softmax2 in attention block. Defaults to ‘softmax’.
act (str, optional) – string that values in [‘GeLU’, ‘ReLUx’, ‘swish’] and that allows to choose from GeLU, ReLUx or swish activation inside MLP. Defaults to ‘GeLU’.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.bc_deit_ti16(input_shape=(224, 224, 3), classes=1000, distilled=False, include_top=True, num_blocks=12)[source]
Instantiates the DeiT-Tiny 16 architecture adapted for implementation on hardware, that is:
LayerNormalization replaced by LayerMadNormalization,
GeLU replaced by ReLU8 activations,
Softmax replaced by shiftmax.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
distilled (bool, optional) – build model appending a distilled token. Defaults to False.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
num_blocks (int, optional) – the number of transformer blocks to use. Defaults to 12.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.bc_deit_dist_ti16_imagenet_pretrained(quantized=True)[source]
Helper method to retrieve a DeiT-Tiny 16 model adapted for implementation on hardware, that is:
LayerNormalization replaced by LayerMadNormalization,
GeLU replaced by ReLU8 activations,
Softmax replaced by shiftmax,
and that was trained on ImageNet dataset.
- Parameters:
quantized (bool, optional) – a boolean indicating whether the model should be loaded quantized or not. Defaults to True.
- Returns:
a Keras Model instance
- Return type:
keras.Model
- akida_models.deit_s16(input_shape=(224, 224, 3), classes=1000, distilled=False, include_top=True)[source]
Instantiates the DeiT-Small 16 architecture; that is a ViT architecture with 6 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
distilled (bool, optional) – build model appending a distilled token. Defaults to False.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model
- akida_models.deit_b16(input_shape=(224, 224, 3), classes=1000, distilled=False, include_top=True)[source]
Instantiates the DeiT-B16 architecture; that is a ViT architecture with 12 attention heads, 12 blocks and a patch size of 16.
- Parameters:
input_shape (tuple, optional) – input shape. Defaults to (224, 224, 3).
classes (int, optional) – number of classes. Defaults to 1000.
distilled (bool, optional) – build model appending a distilled token. Defaults to False.
include_top (bool, optional) – whether to include the final classifier network. Defaults to True.
- Returns:
the requested model
- Return type:
keras.Model