Note

Go to the end to download the full example code.

Advanced QuantizeML tutorial

This tutorial provides a comprehensive understanding of quantization in QuantizeML python package. Refer to QuantizeML user guide and Global Akida workflow tutorial for additional resources.

QuantizeML python package provides a user-friendly collection of functions for obtaining a quantized model. The quantize function replaces TF-Keras layers with quantized, integer only layers from QuantizeML.

1. Defining a quantization scheme

The quantization scheme refers to all the parameters used for quantization, that is the method of quantization such as per-axis or per-tensor, and the bitwidth used for inputs, outputs and weights.

The first part in this section explains how to define a quantization scheme using QuantizationParams, which defines a homogeneous scheme that applies to all layers, and the second part explains how to fully customize the quantization scheme using a configuration file.

1.1. The quantization parameters

The easiest way to customize quantization is to use the qparams parameter of the quantize function. This is made possible by creating a QuantizationParams object.

from quantizeml.models import QuantizationParams

qparams = QuantizationParams(input_weight_bits=8, weight_bits=8, activation_bits=8,
                             per_tensor_activations=False, output_bits=8, buffer_bits=32)

By default, the quantization scheme adopted is 8-bit with per-axis activations, but it is possible to set every parameter with a different value. The following list is a detailed description of the parameters with tips on how to set them:

input_weight_bits is the bitwidth used to quantize weights of the first layer. It is usually set to 8 which allows to better preserve the overall accuracy.
weight_bits is the bitwidth used to quantize all other weights. It is usually set to 8 (Akida 2.0) or 4 (Akida 1.0).
activation_bits is the bitwidth used to quantize all ReLU activations. It is usually set to 8 (Akida 2.0) or 4 (Akida 1.0) but can be lower for edge learning (1-bit).
per_tensor_activations is a boolean that allows to define a per-axis (default) or per-tensor quantization for ReLU activations. Per-axis quantization will usually provide more accurate results (default False value) but it might be more challenging to calibrate the model. Note that Akida 1.0 only supports per-tensor activations.
output_bits is the bitwidth used to quantize intermediate results in OutputQuantizer. Go back to the user guide quantization flow for details about this process.
buffer_bits is the maximum bitwidth allowed for low-level integer operations (e.g matrix multiplications). It is set to 32 and should not be changed as this is what the Akida hardware target will use.

Note

It is recommended to quantize a model to 8-bit or 4-bit to ensure it is Akida hardware compatible.

Warning

QuantizationParams is only applied the first time a model is quantized. If you want to re-quantize a model, you must to provide a complete q_config.

1.2. Using a configuration file

Quantization can be further customized via a JSON configuration passed to the q_config parameter of the quantize function. This usage should be limited to targeted customization as writing a whole configuration from scratch is really error prone. An example of targeted customization is to set the quantization bitwidth of the output of a feature extractor to 1 which will allow edge learning (1.0 feature only).

Warning

When provided, the configuration file has priority over arguments. As a result however, the configuration file therefore must contain all parameters - you cannot rely on argument defaults to set non-specified values.

The following code snippets show what a configuration file looks like and how to edit it to customize quantization.

import tf_keras as keras
import tensorflow as tf
import json
from quantizeml.models import quantize, dump_config, QuantizationParams

# Define an example model with few layers to keep what follows readable
input = keras.layers.Input((28, 28, 3), dtype=tf.uint8)
x = keras.layers.Rescaling(scale=1. / 255, name="rescale")(input)
x = keras.layers.Conv2D(filters=16, kernel_size=3, name="input_conv")(x)
x = keras.layers.DepthwiseConv2D(kernel_size=3, name="dw_conv")(x)
x = keras.layers.Conv2D(filters=32, kernel_size=1, name="pw_conv")(x)
x = keras.layers.ReLU(name="relu")(x)
x = keras.layers.Dense(units=10, name="dense")(x)

model = keras.Model(input, x)

# Define QuantizationParams with specific values just for the sake of understanding the JSON
# configuration that follows.
qparams = QuantizationParams(input_weight_bits=16, weight_bits=4, activation_bits=6, output_bits=12,
                             per_tensor_activations=True, buffer_bits=24)

# Quantize the model
quantized_model = quantize(model, qparams=qparams)

   1/1024 [..............................] - ETA: 2:19
  51/1024 [>.............................] - ETA: 0s  
 101/1024 [=>............................] - ETA: 0s
 153/1024 [===>..........................] - ETA: 0s
 204/1024 [====>.........................] - ETA: 0s
 255/1024 [======>.......................] - ETA: 0s
 306/1024 [=======>......................] - ETA: 0s
 356/1024 [=========>....................] - ETA: 0s
 407/1024 [==========>...................] - ETA: 0s
 458/1024 [============>.................] - ETA: 0s
 510/1024 [=============>................] - ETA: 0s
 562/1024 [===============>..............] - ETA: 0s
 613/1024 [================>.............] - ETA: 0s
 664/1024 [==================>...........] - ETA: 0s
 716/1024 [===================>..........] - ETA: 0s
 766/1024 [=====================>........] - ETA: 0s
 818/1024 [======================>.......] - ETA: 0s
 869/1024 [========================>.....] - ETA: 0s
 920/1024 [=========================>....] - ETA: 0s
 970/1024 [===========================>..] - ETA: 0s
1022/1024 [============================>.] - ETA: 0s
1024/1024 [==============================] - 1s 986us/step

quantized_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_1 (InputLayer)        [(None, 28, 28, 3)]       0

 rescale (QuantizedRescalin  (None, 28, 28, 3)         0
 g)

 input_conv (QuantizedConv2  (None, 26, 26, 16)        450
 D)

 dw_conv (QuantizedDepthwis  (None, 24, 24, 16)        162
 eConv2D)

 pw_conv (QuantizedConv2D)   (None, 24, 24, 32)        544

 relu (QuantizedReLU)        (None, 24, 24, 32)        2

 dense (QuantizedDense)      (None, 24, 24, 10)        330

 dequantizer (Dequantizer)   (None, 24, 24, 10)        0

=================================================================
Total params: 1488 (5.81 KB)
Trainable params: 1482 (5.79 KB)
Non-trainable params: 6 (24.00 Byte)
_________________________________________________________________

# Dump the configuration
config = dump_config(quantized_model)

# Display in a JSON format for readability
print(json.dumps(config, indent=4))

{
    "input_conv": {
        "output_quantizer": {
            "bitwidth": 12,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "weight_quantizer": {
            "bitwidth": 16
        },
        "buffer_bitwidth": 24
    },
    "dw_conv": {
        "output_quantizer": {
            "bitwidth": 12,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "weight_quantizer": {
            "bitwidth": 4,
            "axis": -2
        },
        "buffer_bitwidth": 24
    },
    "pw_conv": {
        "weight_quantizer": {
            "bitwidth": 4
        },
        "buffer_bitwidth": 24
    },
    "relu": {
        "output_quantizer": {
            "bitwidth": 6,
            "signed": false,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "buffer_bitwidth": 24
    },
    "dense": {
        "weight_quantizer": {
            "bitwidth": 4
        },
        "buffer_bitwidth": 24
    }
}

Explaining the above configuration:

the layer names are indexing the configuration dictionary.
the depthwise layer has an OutputQuantizer set to 12-bit (output_bits=12) to reduce intermediate potentials bitwidth before the pointwise layer that follows (automatically added when calling quantize).
the depthwise layer weights are quantized to 16-bit because it is the first layer (input_weight_bits=16) and are quantized per-axis (default for weights). The given axis is -2 because of TF-Keras depthwise kernel shape that is (Kx, Ky, F, 1), channel dimension is at index -2.
the pointwise layer has weights quantized to 4-bit (weight_bits=4) but the quantization axis is not specified as it defaults to -1 for a per-axis quantization. One would need to set it to None for a per-tensor quantization.
the ReLU activation is quantized to 6-bit per-tensor (activation_bits=6, per_tensor_activations=True)
all buffer_bitwidth are set to 24 (buffer_bits=24)

The configuration will now be edited and used to quantize the float model with q_config parameter.

# Edit the ReLU activation configuration
config["relu"]["output_quantizer"]['bitwidth'] = 1
config["relu"]["output_quantizer"]['axis'] = 'per-axis'
config["relu"]["output_quantizer"]['buffer_bitwidth'] = 32
config["relu"]['buffer_bitwidth'] = 32

# Drop other layers configurations
del config['dw_conv']
del config['pw_conv']
del config['dense']

# The configuration is now limited to the ReLU activation
print(json.dumps(config, indent=4))

{
    "input_conv": {
        "output_quantizer": {
            "bitwidth": 12,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "weight_quantizer": {
            "bitwidth": 16
        },
        "buffer_bitwidth": 24
    },
    "relu": {
        "output_quantizer": {
            "bitwidth": 1,
            "signed": false,
            "axis": "per-axis",
            "buffer_bitwidth": 32
        },
        "buffer_bitwidth": 32
    }
}

Now quantize with setting both qparams and q_config parameters: the activation will be quantized using the given configuration and the other layers will use what is provided in qparams.

new_quantized_model = quantize(model, q_config=config, qparams=qparams)

   1/1024 [..............................] - ETA: 1:51
  51/1024 [>.............................] - ETA: 0s  
 102/1024 [=>............................] - ETA: 0s
 153/1024 [===>..........................] - ETA: 0s
 205/1024 [=====>........................] - ETA: 0s
 255/1024 [======>.......................] - ETA: 0s
 307/1024 [=======>......................] - ETA: 0s
 358/1024 [=========>....................] - ETA: 0s
 409/1024 [==========>...................] - ETA: 0s
 460/1024 [============>.................] - ETA: 0s
 510/1024 [=============>................] - ETA: 0s
 561/1024 [===============>..............] - ETA: 0s
 613/1024 [================>.............] - ETA: 0s
 663/1024 [==================>...........] - ETA: 0s
 713/1024 [===================>..........] - ETA: 0s
 764/1024 [=====================>........] - ETA: 0s
 815/1024 [======================>.......] - ETA: 0s
 866/1024 [========================>.....] - ETA: 0s
 917/1024 [=========================>....] - ETA: 0s
 968/1024 [===========================>..] - ETA: 0s
1019/1024 [============================>.] - ETA: 0s
1024/1024 [==============================] - 1s 991us/step

# Dump the new configuration
new_config = dump_config(new_quantized_model)

# Display in a JSON format for readability
print(json.dumps(new_config, indent=4))

{
    "input_conv": {
        "output_quantizer": {
            "bitwidth": 12,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "weight_quantizer": {
            "bitwidth": 16
        },
        "buffer_bitwidth": 24
    },
    "dw_conv": {
        "output_quantizer": {
            "bitwidth": 12,
            "axis": "per-tensor",
            "buffer_bitwidth": 24
        },
        "weight_quantizer": {
            "bitwidth": 4,
            "axis": -2
        },
        "buffer_bitwidth": 24
    },
    "pw_conv": {
        "weight_quantizer": {
            "bitwidth": 4
        },
        "buffer_bitwidth": 24
    },
    "relu": {
        "output_quantizer": {
            "bitwidth": 1,
            "signed": false,
            "axis": "per-axis",
            "buffer_bitwidth": 32
        },
        "buffer_bitwidth": 32
    },
    "dense": {
        "weight_quantizer": {
            "bitwidth": 4
        },
        "buffer_bitwidth": 24
    }
}

The new configuration contains both the manually set configuration in the activation and the parameters defined configuration for other layers.

2. Calibration

2.1. Why is calibration required?

OutputQuantizer are added between layer blocks during quantization in order to decrease intermediate potential bitwidth and prevent saturation. Calibration is the process of defining the best quantization range possible for the OutputQuantizer.

Calibration will statistically determine the quantization range by passing samples into the float model and observing the intermediate output values. The quantization range is stored in range_max variable. The calibration algorithm used in QuantizeML is based on a moving maximum: range_max is initialized with the maximum value of the first batch of samples (per-axis or per-tensor depending on the quantization scheme) and the following batches will update range_max with a moving momentum strategy (momentum is set to 0.9). Refer to the following pseudo code:

samples_max = reduce_max(samples)
delta = previous_range_max - new_range_max * (1 - momentum)
new_range_max = previous_range_max - delta

In QuantizeML like in other frameworks, the calibration process happens simultaneously with quantization and the quantize function thus comes with calibration parameters: samples, num_samples, batch_size and epochs. Sections below describe how to set these parameters.

Note

Calibration does not require any label or sample annotation and is therefore different from training.

2.2. The samples

There are two types of calibration samples: randomly generated samples or real samples.

When the samples parameter of quantize is left to the default None value, random samples will be generated using the num_samples value (default is 1024). When the model input shape has 1 or 3 channels, which corresponds to an image, the random samples value are unsigned 8-bit integers in the [0, 255] range. If the channel dimension is not 1 or 3, the generated samples are 8-bit signed integers in the [-128, 127] range. If that does not correspond to the range expected by your model, either add a Rescaling layer to your model using the insert_rescaling helper or provide real samples.

Real samples are often (but not necessarily) taken from the training dataset and should be the preferred option for calibration as it will always lead to better results.

Samples are batched before being passed to the model for calibration. It is recommended to use at least 1024 samples for calibration. When providing samples, num_samples is only used to compute the number of steps during calibration.

if batch_size is None:
    steps = num_samples
else:
    steps = np.ceil(num_samples / batch_size)

2.3. Other calibration parameters

`batch_size`

Setting a large enough batch_size is important as it will impact range_max initialization that is made on the first batch of samples. The recommended value is 100.

`epochs`

It is the number of iterations over the calibration samples. Increasing the value will allow for more updates of the range_max variables thanks to the momentum policy without requiring a huge amount of samples. The recommended value is 2.

3. Handling input types

In standard machine learning frameworks such as TF-Keras or PyTorch, models usually expect floating-point inputs. In an embedded software and deployment context, floating-points might however not be handled. That is the case for Akida hardware that only accepts integer inputs.

QuantizeML provides an InputQuantizer layer that can be added at the model input in order to convert floating-point inputs to integer inputs expected by Akida. The InputQuantizer layer performs input quantization by applying a scale and an offset to the inputs. These values are computed during calibration by observing the input samples statistics and the quantization range is determined by the quantization dtype given to the quantization parameters, see QuantizationParams.input_dtype. Because Akida only supports a channel last data format, the InputQuantizer layer can also convert data format from channel first to channel last. This only applies to models coming from PyTorch through ONNX.

The InputQuantizer layer added during quantization is later converted to an Akida.Quantizer layer.

While this allows to quickly prototype models that will be deployed on Akida hardware, it is often preferable to handle input quantization and data format conversion natively to prevent the extra scaling, offset and transpose. It is also key to train a model with the same data type as the target application expects.

3.1. InputQuantizer for floating-point inputs

To illustrate this, let’s consider image classification models. While usually trained with float32 data, for deployment, sensors will provide unsigned 8-bit integer images in a channel last format. In that case, it is better to define the model with a uint8 input type (passing the right dtype to the Input layer), quantize with a uint8 dtype and avoid adding an InputQuantizer layer altogether.

Let’s first look at what happens without explicitly setting the Input dtype.

# Define an example model with few layers that could be used for image classification
input = keras.layers.Input((28, 28, 3))
x = keras.layers.Rescaling(scale=1. / 255, name="rescale")(input)
x = keras.layers.Conv2D(16, 3, strides=2, padding="same", name="input_conv")(x)
x = keras.layers.ReLU(name="relu_0")(x)
x = keras.layers.DepthwiseConv2D(3, strides=2, padding="same", name="dw_conv")(x)
x = keras.layers.Conv2D(32, 1, name="pw_conv")(x)
x = keras.layers.ReLU(name="relu_1")(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(units=10, name="dense")(x)

model = keras.Model(input, x)

# Define QuantizationParams with explicit dtype uint8 (which is the default)
qparams = QuantizationParams(input_dtype='uint8')

# Define random calibration samples in range [0, 255] as float32 (it could be any range but this is
# kept simple for the sake of the example)
import numpy as np

calibration_samples = np.random.randint(0, 256, size=(256, 28, 28, 3)).astype(np.float32)

# Quantize the model
quantized_model = quantize(model, qparams=qparams, num_samples=256,
                           samples=calibration_samples, batch_size=64)

1/4 [======>.......................] - ETA: 0s
4/4 [==============================] - 0s 2ms/step

As the model Input Layer is not typed (defaulting to float32), an InputQuantizer layer has been added in second position:

quantized_model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_2 (InputLayer)        [(None, 28, 28, 3)]       0

 input_quantizer (InputQuan  (None, 28, 28, 3)         6
 tizer)

 rescale (QuantizedRescalin  (None, 28, 28, 3)         0
 g)

 input_conv (QuantizedConv2  (None, 14, 14, 16)        448
 D)

 relu_0 (QuantizedReLU)      (None, 14, 14, 16)        32

 dw_conv (QuantizedDepthwis  (None, 7, 7, 16)          192
 eConv2D)

 pw_conv (QuantizedConv2D)   (None, 7, 7, 32)          544

 relu_1 (QuantizedReLU)      (None, 7, 7, 32)          64

 flatten (QuantizedFlatten)  (None, 1568)              0

 dense (QuantizedDense)      (None, 10)                15690

 dequantizer_2 (Dequantizer  (None, 10)                0
 )

=================================================================
Total params: 16976 (66.31 KB)
Trainable params: 16842 (65.79 KB)
Non-trainable params: 134 (536.00 Byte)
_________________________________________________________________

Let’s take a look at what the InputQuantizer layer does during the conversion process.

from quantizeml.models import record_quantization_variables


def print_input_quantizer_params(q_model):
    # Record variables to display them below.
    # Note this is only needed for tutorial purposes and handled automatically during standard
    # conversion process.
    record_quantization_variables(q_model)

    print('InputQuantizer parameters computed during calibration:')
    print(f'  Bitwidth: {q_model.layers[1].bitwidth}')
    print(f'  Signedness: {q_model.layers[1].signed}')
    print(f'  Scale (per-channel): {2 ** q_model.layers[1].frac_bits.value.numpy()}')
    if hasattr(q_model.layers[1], 'zero_points'):
        print(f'  Offset (per-channel): {q_model.layers[1].zero_points.value.values.numpy()}')


print_input_quantizer_params(quantized_model)

InputQuantizer parameters computed during calibration:
  Bitwidth: 8
  Signedness: False
  Scale (per-channel): [1. 1. 1.]
  Offset (per-channel): [0. 0. 0.]

Since inputs were created in the [0, 255] range, the InputQuantizer layer has learned to quantize inputs to uint8 with a scale of 1 and no offset as expected.

3.2. Conversion to Akida with floating-point data

Here is another example where the model naturally takes float32 input data (e.g. a time frequency map). With Akida, this will be quantized to int8.

float_input = keras.layers.Input((10, 25, 2))
y = keras.layers.Conv2D(16, 3, strides=2, padding="same", name="input_conv")(float_input)
y = keras.layers.ReLU(name="relu_0")(y)
y = keras.layers.DepthwiseConv2D(3, strides=2, padding="same", name="dw_conv")(y)
y = keras.layers.Conv2D(32, 1, name="pw_conv")(y)
y = keras.layers.ReLU(name="relu_1")(y)
y = keras.layers.Flatten()(y)
y = keras.layers.Dense(units=10, name="dense")(y)

model = keras.Model(float_input, y)

# Explicitly set input dtype to int8 and recompute calibration samples in the [-1, 1) range as an
# example.
qparams = QuantizationParams(input_dtype='int8')
calibration_samples_int8 = np.random.uniform(-1.0, 1.0, size=(256, 10, 25, 2)).astype(np.float32)

# Quantize the model
quantized_model = quantize(model, qparams=qparams, num_samples=256,
                           samples=calibration_samples_int8, batch_size=64)

1/4 [======>.......................] - ETA: 0s
4/4 [==============================] - 0s 1ms/step

Take a look at the InputQuantizer:

quantized_model.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_3 (InputLayer)        [(None, 10, 25, 2)]       0

 input_quantizer_1 (InputQu  (None, 10, 25, 2)         4
 antizer)

 input_conv (QuantizedConv2  (None, 5, 13, 16)         304
 D)

 relu_0 (QuantizedReLU)      (None, 5, 13, 16)         32

 dw_conv (QuantizedDepthwis  (None, 3, 7, 16)          192
 eConv2D)

 pw_conv (QuantizedConv2D)   (None, 3, 7, 32)          544

 relu_1 (QuantizedReLU)      (None, 3, 7, 32)          64

 flatten_1 (QuantizedFlatte  (None, 672)               0
 n)

 dense (QuantizedDense)      (None, 10)                6730

 dequantizer_3 (Dequantizer  (None, 10)                0
 )

=================================================================
Total params: 7870 (30.74 KB)
Trainable params: 7738 (30.23 KB)
Non-trainable params: 132 (528.00 Byte)
_________________________________________________________________

print_input_quantizer_params(quantized_model)

InputQuantizer parameters computed during calibration:
  Bitwidth: 8
  Signedness: True
  Scale (per-channel): [128. 128.]

Proceed with conversion to an Akida model:

from cnn2snn import convert

akida_model = convert(quantized_model)
akida_model.summary()

                Model Summary
______________________________________________
Input shape  Output shape  Sequences  Layers
==============================================
[10, 25, 2]  [1, 1, 10]    1          7
______________________________________________

_____________________________________________________________
Layer (type)                   Output shape  Kernel shape

======= SW/input_quantizer_1-dequantizer_3 (Software) =======

input_quantizer_1 (Quantizer)  [10, 25, 2]   N/A
_____________________________________________________________
InputData_2 (InputData)        [10, 25, 2]   N/A
_____________________________________________________________
input_conv (Conv2D)            [5, 13, 16]   (3, 3, 2, 16)
_____________________________________________________________
dw_conv (DepthwiseConv2D)      [3, 7, 16]    (3, 3, 16, 1)
_____________________________________________________________
pw_conv (Conv2D)               [3, 7, 32]    (1, 1, 16, 32)
_____________________________________________________________
dense (Dense1D)                [1, 1, 10]    (672, 10)
_____________________________________________________________
dequantizer_3 (Dequantizer)    [1, 1, 10]    N/A
_____________________________________________________________

The model can be deployed on Akida hardware, with the extra scaling and offset natively handled.

3.3. Preventing the InputQuantizer

When using images, it makes more sense to avoid the unnecessary InputQuantizer layer (as we saw above). Here, we will show you how to define a model with uint8 typed input to avoid adding this extra layer. Notice in the Input layer below the added dtype.

typed_input = keras.layers.Input((28, 28, 3), dtype=tf.uint8)
z = keras.layers.Rescaling(scale=1. / 255, name="rescale")(typed_input)
z = keras.layers.Conv2D(16, 3, strides=2, padding="same", name="input_conv")(z)
z = keras.layers.ReLU(name="relu_0")(z)
z = keras.layers.DepthwiseConv2D(3, strides=2, padding="same", name="dw_conv")(z)
z = keras.layers.Conv2D(32, 1, name="pw_conv")(z)
z = keras.layers.ReLU(name="relu_1")(z)
z = keras.layers.Flatten()(z)
z = keras.layers.Dense(units=10, name="dense")(z)

model = keras.Model(typed_input, z)

quantized_model = quantize(model, num_samples=256, samples=calibration_samples, batch_size=64)

quantized_model.summary()

1/4 [======>.......................] - ETA: 0s
4/4 [==============================] - 0s 1ms/step
Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_4 (InputLayer)        [(None, 28, 28, 3)]       0

 rescale (QuantizedRescalin  (None, 28, 28, 3)         0
 g)

 input_conv (QuantizedConv2  (None, 14, 14, 16)        448
 D)

 relu_0 (QuantizedReLU)      (None, 14, 14, 16)        32

 dw_conv (QuantizedDepthwis  (None, 7, 7, 16)          192
 eConv2D)

 pw_conv (QuantizedConv2D)   (None, 7, 7, 32)          544

 relu_1 (QuantizedReLU)      (None, 7, 7, 32)          64

 flatten_2 (QuantizedFlatte  (None, 1568)              0
 n)

 dense (QuantizedDense)      (None, 10)                15690

 dequantizer_4 (Dequantizer  (None, 10)                0
 )

=================================================================
Total params: 16970 (66.29 KB)
Trainable params: 16842 (65.79 KB)
Non-trainable params: 128 (512.00 Byte)
_________________________________________________________________

As expected, the model does not contain any InputQuantizer layer since both the model input and quantization are typed as uint8. The quantization algorithm recognizes that inputs are already in the right format, and so it does not need to quantize them. Thus, conversion to an Akida model will not add any Akida.Quantizer layer, allowing for a more efficient deployment.

Total running time of the script: (0 minutes 16.155 seconds)

Gallery generated by Sphinx-Gallery

Advanced QuantizeML tutorial

1. Defining a quantization scheme

1.1. The quantization parameters

1.2. Using a configuration file

2. Calibration

2.1. Why is calibration required?

2.2. The samples

2.3. Other calibration parameters

batch_size

epochs

3. Handling input types

3.1. InputQuantizer for floating-point inputs

3.2. Conversion to Akida with floating-point data

3.3. Preventing the InputQuantizer

`batch_size`

`epochs`