{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Tips to set Akida learning parameters\n\nThis tutorial gives details about the Akida learning parameters and tips to\nset their values in a first try in an edge learning application. The KWS dataset\nand the DS-CNN-edge model are used as a classification example to showcase the\nhandy tips.\n\nOne can consult the `KWS edge learning tutorial `_\nfor a first approach about Akida learning.\n\n.. Note:: The hints given in this tutorial are not a promise to get the best\n performance. They can be seen as an initialization, before\n fine-tuning. Besides, even if these tips provide good estimates in\n most examples, they can't be guaranteed to work for every application.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Akida learning parameters\n\nTo be ready for learning, an Akida model must be composed of:\n\n 1. a feature extractor returning binary spikes: this part is usually trained\n using the CNN2SNN toolkit.\n 2. an Akida trainable layer added on top of the feature extractor: it must\n have 1-bit weights and several output neurons per class.\n\nThe last trainable layer must be correctly configured to get good learning\nperformance. The two main parameters to set are:\n\n - the number of weights\n - the number of neurons per class\n\nIn the next sections, details about these hyper-parameters are given with\nhandy tips to give a first estimation.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Create Akida model\n\nIn a first stage, we will create the Akida feature extractor returning\nbinary spikes. From then, we will be able to estimate the parameters for\nthe trainable layer that will be added later.\n\nAfter loading the KWS dataset, we create the pre-trained Keras model and\nconvert it to an Akida model. We then remove the last layer to get the\nfeature extractor.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pickle\n\nfrom tensorflow.keras.utils import get_file\n\n# Fetch pre-processed data for 32 keywords\nfname = get_file(\n fname='kws_preprocessed_all_words_except_backward_follow_forward.pkl',\n origin=\n \"http://data.brainchip.com/dataset-mirror/kws/kws_preprocessed_all_words_except_backward_follow_forward.pkl\",\n cache_subdir='datasets/kws')\nwith open(fname, 'rb') as f:\n [x_train_ak, y_train, _, _, _, _, word_to_index, _] = pickle.load(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from cnn2snn import convert\nfrom akida_models import ds_cnn_kws_pretrained\n\n# Instantiate a quantized model with pretrained quantized weights\nmodel = ds_cnn_kws_pretrained()\n\n# Convert to an Akida model\ninput_scaling = (255, 0)\nmodel_ak = convert(model, input_scaling=input_scaling)\n\n# Remove last layer\nmodel_ak.pop_layer()\nmodel_ak.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Estimate the required number of weights of the trainable layer\n\nThe number of weights corresponds to the number of connections for each\nneuron. The smaller the number of weights, the less specific the neurons will\nbe. Setting this parameter correctly is important.\n\nAlthough the last trainable layer hasn't been created yet, we can already\nestimate the number of weights. This estimation is based on the statistics of\nthe output spikes from the feature extractor. Intuitively, a sample producing\nN spikes at the end of the feature extractor could be perfectly represented\nwith a neuron with N weights. We then use the median of the number of output\nspikes for all samples.\n\nTo reduce computing time, using only a subset of the whole dataset may be\nsufficient to get an estimation of the number of spikes. We then set the\nnumber of weights to a value sligthly higher than the median of the number of\nspikes: we generally choose 1.2 x median of number of spikes, which seems to\ngive good results.\n\nFor a deeper analysis of the output spikes from the feature extractor, one\ncould look at the distribution of the number of spikes, either for all samples\nor for samples of each class separately. This analysis is not shown here.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n\n# Forward samples to get the number of output spikes\n# Here, 10% of the training set is sufficient for a good estimation\nnum_samples_to_use = int(len(x_train_ak) / 10)\nspikes = model_ak.forward(x_train_ak[:num_samples_to_use])\n\n# Compute the median of the number of output spikes\nmedian_spikes = np.median(spikes.sum(axis=(1, 2, 3)))\nprint(f\"Median of number of spikes: {median_spikes}\")\n\n# Set the number of weights to 1.2 x median\nnum_weights = int(1.2 * median_spikes)\nprint(f\"The number of weights is then set to: {num_weights}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Estimate the number of neurons per class\n\nUnlike a standard CNN network where each class is represented by a single\noutput neuron, an Akida native training requires several neurons for each\nclass to better represent the class variability. Choosing the right number of\nneurons per class is a trade-off between enough neurons to represent the\nclasses' variabilities, but not too many neurons implying more memory and\ncomputing time. This is similar to clustering algorithms where the clusters\nrepresent the distribution of the data. Note that, like clustering algorithms,\nthis analysis requires to have more samples per class than the number of\nneurons per class: only one neuron can learn per sample. Having more neurons\nthan samples, the extra neurons are guaranteed to be wasted.\n\nOne direct option is to train the classification layer using the whole dataset\nwith different values of number of neurons per class. Looking at the\nvalidation accuracy, it should increase with more neurons per class, then\nreach a plateau where adding more neurons has very small effect. Choosing the\nvalue where the accuracy begins to flatten is a good estimation.\n\nHowever, this method is very time consuming since it requires multiple\ntrainings using the whole dataset. Another option is to only train on a few\nnumber of classes. Rather than measuring accuracy, we measure the error\nbetween the potential of the matching neuron and the maximum theoretical\npotential. Taking a simple example:\n\n- Let's say 3 neurons per class, with 180 weights\n- A sample of a given class returns 3 potentials for the 3 neurons of its\n class: [12, 153, 97]. The maximum potential is 153.\n- The error between the sample and the neuron is 180 - 153 = 27.\n- Compute the loss being the sum of the errors for all samples of a class.\n\nVisualizing the loss for a given class as a function of the number of neurons\ngives hints to have a first estimation of the number of neurons per class.\nVisualizing the number of neurons that have learned as a function of the\nnumber of neurons per class provides a similar analysis.\n\nIn this tutorial, we only present this analysis for one class (word 'six').\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from akida import Model, InputData, FullyConnected\n\n\ndef compute_losses(model,\n samples,\n neurons_per_class,\n num_weights,\n learning_competition=0.1,\n num_repetitions=1):\n \"\"\"Compute losses after training an Akida FullyConnected layer for samples\n of one class.\n\n For each value of 'neurons_per_class', a training is performed, and the loss\n and the number of neurons that have learned are returned.\n\n Args:\n model: an Akida model for feature extraction\n samples: a NumPy array of input samples of one class\n neurons_per_class: an 1-D iterable object storing the integer values of\n the number of neurons to test\n num_weights: the number of non-zero weights in each neuron\n learning_competition: the learning competition of the trainable layer\n num_repetitions: the number of times the training must be performed.\n The training with the minimum loss will be kept.\n\n Returns:\n the losses and the numbers of neurons that have learned\n\n \"\"\"\n spikes = model.forward(samples)\n\n def create_one_fc_model(num_neurons):\n model_fc = Model()\n model_fc.add(\n InputData(name=\"input\",\n input_width=1,\n input_height=1,\n input_channels=spikes.shape[-1],\n input_bits=1))\n layer_fc = FullyConnected(name='akida_edge_layer',\n num_neurons=num_neurons,\n activations_enabled=False)\n model_fc.add(layer_fc)\n model_fc.compile(num_weights=num_weights,\n learning_competition=learning_competition)\n return model_fc\n\n losses = np.zeros((len(neurons_per_class), num_repetitions))\n num_learned_neurons = np.zeros((len(neurons_per_class), num_repetitions))\n for idx, n in enumerate(neurons_per_class):\n for i in range(num_repetitions):\n model_fc = create_one_fc_model(num_neurons=n)\n\n # Train model\n permut_spikes = np.random.permutation(spikes)\n model_fc.fit(permut_spikes)\n\n # Get max potentials\n max_potentials = model_fc.forward(permut_spikes).max(axis=-1)\n losses[idx, i] = np.sum(num_weights - max_potentials)\n\n # Get threshold learning\n th_learn = model_fc.get_layer('akida_edge_layer').get_variable(\n 'threshold_learning')\n num_learned_neurons[idx, i] = np.sum(th_learn > 0)\n\n return losses.min(axis=1) / len(spikes), num_learned_neurons.min(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Choose a word to analyze and the values for the number of neurons\nword = 'six'\nneurons_per_class = [\n 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90,\n 100, 150, 200, 250, 300, 350, 400, 450, 500, 750, 1000\n]\n\n# Compute the losses for word 'six' and different number of neurons\nidx_samples = (y_train == word_to_index[word])\nx_train_word = x_train_ak[idx_samples]\nlosses, num_learned_neurons = compute_losses(model_ak, x_train_word,\n neurons_per_class, num_weights)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n\nplt.plot(np.array(neurons_per_class), losses)\nplt.xlabel(\"Number of neurons per class\")\nplt.ylabel(\"Loss\")\nplt.title(f\"Losses for samples of class '{word}'\")\nplt.grid(linestyle='--')\nplt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plt.plot(np.array(neurons_per_class), num_learned_neurons)\nplt.xlabel(\"Number of neurons per class\")\nplt.ylabel(\"Nb of neurons that have learned\")\nplt.title(f\"Nb of neurons that have learned for samples of class '{word}'\")\nplt.grid(linestyle='--')\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the figures above, we can see that the point of inflection occured with\nabout 300 neurons. Setting the number of neurons per class to this value is a\ngood starting point: we expect a very good accuracy after training. Adding\nmore neurons won't improve the acccuracy and will increase the computing time.\n\nHowever, one could gradually reduce the number of neurons per class to see its\ninfluence on the accuracy of a complete training. In the KWS edge tutorial, we\nfinally set this value to 50 because it is a good trade-off between computing\ntime and our target accuracy. The table below presents the validation accuracy\nafter training for different numbers of neurons. We can see that there is no\nincrease for a number of neurons per class higher than 300. Note that in this\nuse case, the validation accuracy remains very high even for a small number of\nneurons per class: one should be aware that this small decrease in accuracy\ncannot be generalized for all use cases.\n\n+-------------+----------+-------------+\n| Nb. neurons | Accuracy | Time ratio |\n+=============+==========+=============+\n| 10 | 91.6 % | 0.83 |\n+-------------+----------+-------------+\n| 20 | 91.8 % | 0.84 |\n+-------------+----------+-------------+\n| 50 | 92.1 % | 0.86 |\n+-------------+----------+-------------+\n| 100 | 92.3 % | 0.89 |\n+-------------+----------+-------------+\n| 200 | 92.5 % | 0.94 |\n+-------------+----------+-------------+\n| 300 | 92.6 % | 1 |\n+-------------+----------+-------------+\n| 400 | 92.5 % | 1.05 |\n+-------------+----------+-------------+\n| 500 | 92.5 % | 1.10 |\n+-------------+----------+-------------+\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}