{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Advanced CNN2SNN tutorial\n\nThis tutorial gives insights about CNN2SNN for users who want to go deeper\ninto the quantization possibilities of Keras models. We recommend first looking\nat the `user guide <../../user_guide/cnn2snn.html>`__ and the\n`CNN2SNN conversion flow tutorial `__ to get started with\nCNN2SNN.\n\nThe CNN2SNN toolkit offers an easy-to-use set of functions to get a quantized\nmodel from a native Keras model and to convert it to an Akida model compatible\nwith the Akida NSoC. The `quantize <../../api_reference/cnn2snn_apis.html#quantize>`__\nand `quantize_layer <../../api_reference/cnn2snn_apis.html#quantize-layer>`__\nhigh-level functions replace native Keras layers into custom CNN2SNN quantized\nlayers which are derived from their Keras equivalents. However, these functions\nare not designed to choose how the weights and activations are quantized. This\ntutorial will present an alternative low-level method to define models with\ncustomizable quantization of weights and activations.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Design a CNN2SNN quantized model\n\nUnlike the standard CNN2SNN flow where a native Keras model is quantized\nusing the ``quantize`` and ``quantize_layer`` functions, a customizable\nquantized model must be directly created using quantized layers.\n\nThe CNN2SNN toolkit supplies custom quantized layers to replace native\nKeras neural layers (Conv2D, SeparableConv2D and Dense) and\nactivations (ReLU).\n\n### Quantized neural layers\n\nThe CNN2SNN quantized neural layers are:\n\n* **QuantizedConv2D**, derived from ``keras.Conv2D``\n* **QuantizedSeparableConv2D**, derived from ``keras.SeparableConv2D``\n* **QuantizedDense**, derived from ``keras.Dense``\n\nThey are similar to their Keras counterparts, but have an additional\nargument: ``quantizer``. This parameter expects a *WeightQuantizer* object\nthat defines how the weights are discretized for a given bitwidth. Some\nquantizers are proposed in the CNN2SNN API:\n\n* **StdWeightQuantizer** and **TrainableStdWeightQuantizer**: these two\n quantizers use the standard deviation of the weights to compute\n the range on which weights are discretized. The *StdWeightQuantizer* uses\n a range equal to a fixed number of standard deviations. The trainable\n version uses a variable number of standard deviations where this number\n is a trainable parameter of the model.\n* **MaxQuantizer** and **MaxPerAxisQuantizer**: these discretize on\n a range based on the maximum of the absolute value of the weights. The\n *MaxQuantizer* discretizes all weights within a layer based on their global\n maximum, whereas the *MaxPerAxisQuantizer* discretizes each feature kernel,\n in practice the last dimension of the weights tensor, independently based\n on its local maximum.\n\nIf those quantizers do not fit your specific needs, you can\ncreate your own (cf. `weight-quantizer-section`).\n\n.. Note:: The `QuantizedSeparableConv2D` layer can accept two quantizers:\n one ``quantizer`` for the pointwise convolution and a\n ``quantizer_dw`` for the depthwise convolution. If the latter is\n not defined, it is set by default to the same value as\n ``quantizer``.\n\n For Akida compatibility, the depthwise quantizer must be a\n per-tensor quantizer (i.e. all weights within the depthwise kernel\n are quantized together) and not a per-axis quantizer (i.e. each\n feature kernel is quantized independently). See more details\n `here `__.\n\n\n### Quantized activation layers\n\nSimilarly, a quantized activation layer returns values that are discretized\non a uniform grid. Two quantized activation layers are provided to replace\nthe native ReLU layers:\n\n* **ActivationDiscreteRelu**: a linear quantizer for ReLU, clipped at value 6.\n* **QuantizedRelu**: a trainable activation layer where the activation threshold\n and the max clipping value are learned.\n\nIt is also possible to define a custom quantized activation layer. Details\nare given in the section `activation-section`.\n\n.. Note:: The ``quantize`` function is a high-level helper that automatically\n replaces the neural layers with their corresponding quantized\n counterparts, using\n `MaxPerAxisQuantizer <../../api_reference/cnn2snn_apis.html#maxperaxisquantizer>`__.\n The ReLU layers are substituted by\n `ActivationDiscreteRelu <../../api_reference/cnn2snn_apis.html#activationdiscreterelu>`__\n layers.\n\n### Load pre-trained weights from a native Keras model\n\nIn a standard quantization-aware training workflow, the pre-trained weights\nfrom a native Keras model are loaded into the equivalent quantized model.\nWeight quantizers and activation layers, such as the\n*TrainableWeightQuantizer* and the *QuantizedReLU*, have trainable variables\n(also called \"weights\" in Keras). For example, if a *Conv2D* layer with two\nweights (convolutional weights and bias) is replaced by a *QuantizedConv2D*\nwith a TrainableWeightQuantizer, the new quantized layer has then three\nweights: convolutional weights, bias and the quantizer variable.\nThus, the total number of weights in the quantized CNN2SNN model is larger,\ncompared to the equivalent native Keras model. Directly loading pre-trained\nweights from the native Keras model using the Keras ``load_weights`` function\nwill then fail, as it expects that both source and destination models have\nthe same number of weights.\n\nTo circumvent this issue, the ``cnn2snn.load_partial_weights`` function loads\nweights, even with extra variables, in the new model, provided that the\nlayer names in the two models are identical. We therefore recommend using the\nsame names in both native and quantized models.\n\n\n### Create a quantized model\n\nHere, we illustrate how to create a quantized model, equivalent to a native\nKeras model. We use the weight quantizers\nand quantized activation layers available in the CNN2SNN package. Although\nwe present only one weight quantizer and one quantized activation, a quantized\nmodel can be a mix of any quantizers and activations. For instance, every\nneural layer can have a different weight quantizer with different parameters.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from tensorflow.keras import Sequential, Input, layers\n\n# Create a native Keras toy model\nmodel_keras = Sequential([\n\n # Input layer\n Input(shape=(28, 28, 1)),\n\n # Conv + MaxPool + BatchNorm + ReLU\n layers.Conv2D(8, 3),\n layers.MaxPool2D(),\n layers.BatchNormalization(),\n layers.ReLU(),\n\n # Flatten + Dense + Softmax\n layers.Flatten(),\n layers.Dense(10),\n layers.Softmax()\n])\n\nmodel_keras.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from cnn2snn import quantization_layers as qlayers\nfrom cnn2snn import quantization_ops as qops\n\n# Prepare weight quantizers\nq1 = qops.MaxQuantizer(bitwidth=8)\nq2 = qops.MaxQuantizer(bitwidth=4)\n\n# Get layer names to set them in the quantized model\nnames = [layer.name for layer in model_keras.layers]\n\n# Create a quantized model, equivalent to the native Keras model\nmodel_quantized = Sequential([\n\n # Input layer\n Input(shape=(28, 28, 1)),\n\n # Conv + MaxPool + BatchNorm + ReLU\n qlayers.QuantizedConv2D(8, 3, quantizer=q1, name=names[0]),\n layers.MaxPool2D(name=names[1]),\n layers.BatchNormalization(name=names[2]),\n qlayers.QuantizedReLU(bitwidth=4, name=names[3]),\n\n # Flatten + Dense + Softmax\n layers.Flatten(name=names[4]),\n qlayers.QuantizedDense(10, quantizer=q2, name=names[5]),\n layers.Softmax(name=names[6]),\n])\n\nmodel_quantized.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As detailed in the summary, the *QuantizedReLU* layer has two trainable\nparameters. The quantized model has then two parameters more than the\nnative Keras model. To load weights from the native model, we then\nuse the provided ``load_partial_weights`` function. Remember that both\nmodels must have the same layer names.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from cnn2snn import load_partial_weights\n\nload_partial_weights(model_quantized, model_keras)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n## 2. Weight Quantizer Details\n\n### How a weight quantizer works\n\nThe purpose of a weight quantizer is to compute a tensor of discretized\nweights. It can be split into two operations:\n\n- an optional transformation applied on the weights, e.g. a shift, a\n non-linear transformation, ...\n- the quantization of the weights.\n\nFor Akida compatibility, the weights must be discretized on a symmetric grid\ndefined by two parameters:\n\n- the **bitwidth** defines the number of unique values the weights can take.\n We define *kmax = 2^(bitwidth-1) - 1*, being the maximum integer value of\n the symmetric quantization scheme. For instance, a 4-bit quantizer must\n return weights on a grid of 15 values, between -7 and 7. Here, *kmax = 7*.\n- the symmetric range on which the weights will be discretized (let's say\n between *-lim* and *lim*). Instead of working with the range, we use the\n **scale factor** which is defined by *sf = kmax / lim*, where *sf* is the\n scale factor. For instance with a 4-bit quantizer, the discretized weights\n will be on the grid [*-7/sf, -6/sf, ..., -1/sf, 0, 1/sf, ..., 6/sf, 7/sf*].\n The maximum discrete value *7/sf* is equal to *lim*, the limit of the range\n (see figure below).\n\n\n\nWhen training, the weight quantization is applied during the forward pass:\nthe weights are quantized and then used for the convolution or the fully\nconnected operation. However, during the back-propagation phase, the gradient\nis computed as if there were no quantization and the weights are updated\nbased on their original values before quantization. This is usually called\nthe \"Straight-Through Estimator\" (STE) and it can be done using the\n``tf.stop_gradient`` function.\n\n.. Note:: Remember that the weights are stored as standard float values in\n the model. To get the quantized weights, you must first retrieve\n the standard weights, using ``get_weights()``. Then, you can apply\n the ``quantize`` function of the weight quantizer to obtain the\n discretized weights. Finally, if you want to get the integer\n quantized values (between *-kmax* and *kmax*), you must multiply\n the discretized weights by the scale factor.\n\n### How to create a custom weight quantizer\n\nThe CNN2SNN API proposes a way to create a custom weight quantizer. It must\nbe derived from the ``WeightQuantizer`` base class and must override two\nmethods:\n\n- the ``scale_factor(w)`` method, returning the scale factor based on the\n input weights. The output must be a scalar or vectorial TensorFlow tensor.\n Per-tensor quantization will give a single scalar value, whereas\n per-axis quantization will yield a vector with a scale factor for each\n feature kernel.\n- the ``quantize(w)`` method, returning the discretized weights based on the\n scale factor and the bitwidth. A Tensorflow tensor must be returned. The\n two operations (optional transformation and quantization) are performed in\n here.\n\n.. Note:: To be able to correctly train a quantized model, it is important\n to implement the STE estimator in the ``quantize`` function, by\n using ``tf.stop_gradient`` at the quantization operation.\n\nIf there is no need for the optional transformation in the custom quantizer,\nthe CNN2SNN toolkit gives a ``LinearWeightQuantizer`` that skips this\nstep. The ``quantize`` function is already provided and only the\n``scale_factor`` function must be overridden.\n\n\n### Why use a different quantizer\n\nLet's now see a use case where it is interesting to consider the behaviour of\ndifferent quantizers. The *MaxQuantizer* used in the *QuantizedDense* layer\nof our above model discretizes the weights based on their maximum value. The\ndefault *MaxPerAxisQuantizer* has a similar behaviour with an additional\nper-axis quantization design. If weights contain outliers, that are very\nlarge weights in absolute value, this quantization scheme based on maximum\nvalue can be inappropriate. Let's look at it in practice: we retrieve the\nweights of the QuantizedDense layer and compute the discretized counterparts\nusing the *MaxQuantizer* of the layer.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\nimport tensorflow as tf\nimport matplotlib.pyplot as plt\n\n# Retrieve weights and quantizer of the QuantizedDense layer\ndense_name = names[5]\nquantizer = model_quantized.get_layer(dense_name).quantizer\nw = model_quantized.get_layer(dense_name).get_weights()[0]\n\n# Artificially add outliers\nw[:5, :] = 0.5\n\n# Compute discretized weights\nwq = quantizer.quantize(tf.constant(w)).numpy()\n\n\n# Show original and discretized weights histograms\ndef plot_discretized_weights(w, wq):\n xlim = [-0.095, 0.53]\n fig, (ax1, ax2) = plt.subplots(1, 2)\n ax1.hist(w.flatten(), bins=50)\n ax1.set_xlim(xlim)\n ax1.get_yaxis().set_visible(False)\n ax1.title.set_text(\"Original weights distribution\")\n\n vals, counts = np.unique(wq, return_counts=True)\n ax2.bar(vals, counts, 0.005)\n ax2.set_xlim(xlim)\n ax2.get_yaxis().set_visible(False)\n ax2.title.set_text(\"Discretized weights distribution\")\n\n plt.tight_layout()\n plt.show()\n\n\nplot_discretized_weights(w, wq)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The graphs above illustrate that a *MaxQuantizer* applied on weights with\noutliers will keep the full range of weights to discretize. In this use case,\nthe large majority of weights is between -0.1 and 0.1, and are discretized\non only three quantization values. The outliers at 0.5 are preserved after\nquantization. If outlier weights don't represent much information in the\nlayer, it can be preferable to use another weight quantizer which \"forgets\"\nthem.\n\nThe *StdWeightQuantizer* is a good alternative for this use case: the\nquantization range is based on the standard deviation of the original\nweights. Outliers have little impact on the standard deviation of the\nweights. Then the outliers can be out of the range based on the standard\ndeviation.\n\nIn this tutorial, instead of directly using the *StdWeightQuantizer*, we\npresent how to create a quantizer. The custom quantizer created below is a\nsimplified version of the *StdWeightQuantizer*. It is derived from the\n`LinearWeightQuantizer <../../api_reference/cnn2snn_apis.html#linearweightquantizer>`__.\nAs mentioned above, the ``quantize`` function is already implemented in\n*LinearWeightQuantizer*. Only the ``scale_factor`` function must be\noverridden.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define a custom weight quantizer\nclass CustomStdQuantizer(qops.LinearWeightQuantizer):\n \"\"\"This is a custom weight quantizer that defines the scale factor based\n on the standard deviation of the weights.\n\n The weights in range (-2*std, 2*std) are quantized into (2**bitwidth - 1)\n levels and the weights outside this range are clipped to \u00b12*std.\n \"\"\"\n\n def scale_factor(self, w):\n std_dev = tf.math.reduce_std(w)\n return self.kmax_ / (2 * std_dev)\n\n\nquantizer_std = CustomStdQuantizer(bitwidth=4)\n\n# Compute discretized weights\nwq = quantizer_std.quantize(tf.constant(w)).numpy()\n\n# Show original and discretized weights histograms\nplot_discretized_weights(w, wq)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The two graphs above show that using a quantizer based on the standard\ndeviation can remove the outliers and give a finer discretization of the\nweights between -0.1 and 0.1. In this toy example, the *MaxQuantizer*\ndiscretizes the \"small\" weights on 3 quantization values, whereas the\n*CustomStdQuantizer* discretizes them on about 13-14 quantization values.\nDepending on the need to preserve the outliers or not, one quantizer or\nthe other is preferable.\n\nIn our experience, the *MaxPerAxisQuantizer* yields better results in most\nuse cases, especially for post-training quantization, which is why it is the\ndefault quantizer.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n## 3. Quantized Activation Layer Details\n\n### How a quantized activation works\n\nA quantized activation layer works as a ReLU layer with an additional\nquantization step. It can then be seen as a succession of two operations:\n\n- a linear activation function, clipped between zero and a maximum\n activation value\n- the quantization, which is a ceiling operation. The activations will be\n uniformly quantized between zero and the maximum activation value.\n\nThe linear activation function is defined by (cf. the blue line in the graph\nbelow):\n\n- the activation threshold: the value above which a neuron fires\n- the maximum activation value: any activation above will be clipped\n- the slope of the linear function: unlike a ReLU function with a fixed\n slope of 1, the CNN2SNN quantized activation accepts a different value.\n\nThe quantization operation is defined by one parameter: the bitwidth. The\nactivation function is quantized using the ceiling operator on\n*2^bitwidth - 1* positive activation levels. For instance, a 4-bit quantized\nactivation gives 15 activation levels (plus the zero activation) uniformly\ndistributed between zero and the maximum activation value (cf. the orange\nline in the graph).\n\n\n\n\nDuring training, the ceiling quantization is performed in the forward pass:\nthe activations are discretized and then transferred to the next layer.\nHowever, during the back-propagation phase, the gradient is computed as if\nthere were no quantization: only the gradient of the clipped linear\nactivation function (blue line above) is back-propagated. Like for weight\nquantizers, this STE estimator is done using the ``tf.stop_gradient``\nfunction.\n\n\n### How to create a custom quantized activation layer\n\nThe ``QuantizedActivation`` base class lets users easily create custom\nquantized activation layers. Three property functions must be overridden\nand return scalar Tensorflow objects (tf.constant, tf.Variable):\n\n- the ``threshold`` property, returning the activation threshold\n- the ``step_height`` property, returning the step height between two\n activation levels. It is defined as the maximum activation value divided\n by the number of activation levels (i.e. *2^bitwidth - 1*)\n- the ``step_width`` property, returning the step width as shown in the\n above figure. It is computed as: *max_value / slope / (2^bitwidth - 1)*\n\nNote that the slope of the linear activation function is equal to\n*step_height/step_width*.\n\n### Why use a different quantized activation\n\nThe default *ActivationDiscreteRelu* layer does not allow choosing a maximum\nactivation value. For instance, a 4-bit *ActivationDiscreteRelu* layer clips\nactivations to 6. In use cases where input potentials are rather small,\nlet's say smaller than 3, clipping to 6 means that the input potentials will\nbe quantized only on the first half of the possible activation levels. Let's\nsee an example.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Create an ActivationDiscreteRelu layer\nact_layer = qlayers.ActivationDiscreteRelu(bitwidth=4)\nprint(f\"Activation step height: {act_layer.step_height.numpy():.2f}\")\n\n# Compute quantized activations for input potentials between -1 and 7\ninput_potentials = np.arange(-1, 7, 0.01).astype(np.float32)\nactivations = act_layer(input_potentials)\n\n# Plot quantized activations\nplt.plot(input_potentials, activations.numpy(), '.')\nplt.vlines(3, 0, 6, 'k', (0, (1, 5)))\nplt.title(\"Quantized activations with ActivationDiscreteRelu\")\nplt.tight_layout()\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that, with input potentials smaller than 3, shown by the dotted\nvertical line, the output quantized activations only takes 7 levels, with a\nstep height of 0.4. We don't benefit from all the quantization levels.\n\nOne option is to define a custom quantized activation layer where we can set\nthe maximum activation value. In our use case, we can set it to 3, in order\nto take advantage of all the quantization levels by reducing the step height.\nWe suppose a slope of 1 and a threshold of half the step width (as set in\n*ActivationDiscreteRelu*). We then override the three property functions.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class CustomQuantizedActivation(qlayers.QuantizedActivation):\n\n def __init__(self, bitwidth, max_value, **kwargs):\n super().__init__(bitwidth, **kwargs)\n self.step_height_ = tf.constant(max_value / self.levels)\n\n @property\n def step_height(self):\n return self.step_height_\n\n @property\n def step_width(self):\n return self.step_height_\n\n @property\n def threshold(self):\n return 0.5 * self.step_height_\n\n\n# Create a custom quantized activation layer\ncustom_act_layer = CustomQuantizedActivation(bitwidth=4, max_value=3)\nprint(f\"Custom activation step height: \"\n f\"{custom_act_layer.step_height.numpy():.2f}\")\n\n# Compute new quantized activations\nnew_activations = custom_act_layer(input_potentials)\n\n# Plot new quantized activations\nplt.plot(input_potentials, activations.numpy(), '.')\nplt.plot(input_potentials, new_activations.numpy(), '.')\nplt.vlines(3, 0, 6, 'k', (0, (1, 5)))\nplt.legend([\"ActivationDiscreteRelu\", \"CustomQuantizedActivation\"])\nplt.title(\"Quantized activations with CustomQuantizedActivation\")\nplt.tight_layout()\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The quantized activations are clipped to 3 as expected, and the step height\nis now 0.2. The activations between 0 and 3 are then discretized on 15\nactivation levels, versus 7 with *ActivationDiscreteRelu*. This new layer\ngives a finer discretization and is better adjusted to our use case with\nsmall potentials.\n\nBesides, in the *QuantizedReLU* layer provided in the CNN2SNN toolkit, there\nare two trainable variables that learn the activation threshold and the\nstep width. The step height is set to the step width, to preserve a slope of\n1, as in the standard ReLU layer. This can be a suitable activation layer for\nuse cases where the maximum activation value is not known. The layer can learn\nwhat are the best values to adapt to the input potentials.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}