Note
Go to the end to download the full example code.
AkidaNet/ImageNet inference
This tutorial presents how to convert, map, and capture performance from AKD1000 Hardware using an AkidaNet model.
AkidaNet architecture is a MobileNet v1-inspired architecture optimized for implementation on Akida 1.0: it exploits the richer expressive power of standard convolutions in early layers, but uses separable convolutions in later layers where filter memory is limiting.
As ImageNet images are not publicly available, performance is assessed using a set of 10 copyright free images that were found on Google using ImageNet class names.
Note
This tutorial uses an Akida 1.0 architecture to show AKD1000 mapping and performance. See the dedicated tutorial for 1.0 and 2.0 differences.
1. Dataset preparation
Test images all have at least 256 pixels in the smallest dimension. They must
be preprocessed to fit in the model. The imagenet.preprocessing.get_preprocessed_samples
function loads and preprocesses (decodes, crops and extracts a square
224x224x3 patch from an input image) a set of 10 ImageNet-like images.
Note
Input size is here set to 224x224x3 as this is what is used by the model presented in the next section.
import akida
import numpy as np
from akida_models.imagenet import get_preprocessed_samples
# Model specification and hyperparameters
NUM_CHANNELS = 3
IMAGE_SIZE = 224
# Load the preprocessed images and their corresponding labels for the test set
x_test, labels_test = get_preprocessed_samples(IMAGE_SIZE, NUM_CHANNELS)
print(f'{x_test.shape[0]} images and their labels are loaded and preprocessed.')
Downloading data from https://data.brainchip.com/dataset-mirror/imagenet_like/imagenet_like.zip.
0/20418307 [..............................] - ETA: 0s
114688/20418307 [..............................] - ETA: 9s
868352/20418307 [>.............................] - ETA: 2s
3170304/20418307 [===>..........................] - ETA: 0s
4579328/20418307 [=====>........................] - ETA: 0s
6004736/20418307 [=======>......................] - ETA: 0s
8634368/20418307 [===========>..................] - ETA: 0s
10510336/20418307 [==============>...............] - ETA: 0s
12738560/20418307 [=================>............] - ETA: 0s
14180352/20418307 [===================>..........] - ETA: 0s
16187392/20418307 [======================>.......] - ETA: 0s
18407424/20418307 [==========================>...] - ETA: 0s
20418307/20418307 [==============================] - 1s 0us/step
Download complete.
10 images and their labels are loaded and preprocessed.
2. Pretrained quantized model
The Akida model zoo contains a pretrained quantized helper.
The quantization scheme for this model is the following:
the first layer has 8-bit weights,
all other layers have 4-bit weights,
all activations are 4-bit.
from cnn2snn import set_akida_version, AkidaVersion
from akida_models import akidanet_imagenet_pretrained
# Use a quantized model with pretrained quantized weights
with set_akida_version(AkidaVersion.v1):
model_keras_quantized_pretrained = akidanet_imagenet_pretrained(0.5)
model_keras_quantized_pretrained.summary()
/usr/local/lib/python3.11/dist-packages/akida_models/model_io.py:134: UserWarning: Model akidanet_imagenet_224_alpha_50_iq8_wq4_aq4.h5 has been trained with akida_models 1.1.10 which is the last version supporting 1.0 models training
warnings.warn(f'Model {model_name_v1} has been trained with akida_models 1.1.10 which is '
Downloading data from https://data.brainchip.com/models/AkidaV1/akidanet/akidanet_imagenet_224_alpha_50_iq8_wq4_aq4.h5.
0/5589312 [..............................] - ETA: 0s
90112/5589312 [..............................] - ETA: 3s
851968/5589312 [===>..........................] - ETA: 0s
2752512/5589312 [=============>................] - ETA: 0s
4857856/5589312 [=========================>....] - ETA: 0s
5589312/5589312 [==============================] - 0s 0us/step
Download complete.
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 224, 224, 3) 0
conv_0 (QuantizedConv2D) (None, 112, 112, 16) 448
conv_0/relu (QuantizedReLU (None, 112, 112, 16) 0
)
conv_1 (QuantizedConv2D) (None, 112, 112, 32) 4640
conv_1/relu (QuantizedReLU (None, 112, 112, 32) 0
)
conv_2 (QuantizedConv2D) (None, 56, 56, 64) 18496
conv_2/relu (QuantizedReLU (None, 56, 56, 64) 0
)
conv_3 (QuantizedConv2D) (None, 56, 56, 64) 36928
conv_3/relu (QuantizedReLU (None, 56, 56, 64) 0
)
separable_4 (QuantizedSepa (None, 28, 28, 128) 8896
rableConv2D)
separable_4/relu (Quantize (None, 28, 28, 128) 0
dReLU)
separable_5 (QuantizedSepa (None, 28, 28, 128) 17664
rableConv2D)
separable_5/relu (Quantize (None, 28, 28, 128) 0
dReLU)
separable_6 (QuantizedSepa (None, 14, 14, 256) 34176
rableConv2D)
separable_6/relu (Quantize (None, 14, 14, 256) 0
dReLU)
separable_7 (QuantizedSepa (None, 14, 14, 256) 68096
rableConv2D)
separable_7/relu (Quantize (None, 14, 14, 256) 0
dReLU)
separable_8 (QuantizedSepa (None, 14, 14, 256) 68096
rableConv2D)
separable_8/relu (Quantize (None, 14, 14, 256) 0
dReLU)
separable_9 (QuantizedSepa (None, 14, 14, 256) 68096
rableConv2D)
separable_9/relu (Quantize (None, 14, 14, 256) 0
dReLU)
separable_10 (QuantizedSep (None, 14, 14, 256) 68096
arableConv2D)
separable_10/relu (Quantiz (None, 14, 14, 256) 0
edReLU)
separable_11 (QuantizedSep (None, 14, 14, 256) 68096
arableConv2D)
separable_11/relu (Quantiz (None, 14, 14, 256) 0
edReLU)
separable_12 (QuantizedSep (None, 7, 7, 512) 133888
arableConv2D)
separable_12/relu (Quantiz (None, 7, 7, 512) 0
edReLU)
separable_13 (QuantizedSep (None, 7, 7, 512) 267264
arableConv2D)
separable_13/global_avg (G (None, 512) 0
lobalAveragePooling2D)
separable_13/relu (Quantiz (None, 512) 0
edReLU)
dropout (Dropout) (None, 512) 0
classifier (QuantizedDense (None, 1000) 513000
)
=================================================================
Total params: 1375880 (5.25 MB)
Trainable params: 1375880 (5.25 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Check model performance on the 10 images set.
from timeit import default_timer as timer
num_images = len(x_test)
start = timer()
potentials_keras = model_keras_quantized_pretrained.predict(x_test, batch_size=100)
end = timer()
print(f'Keras inference on {num_images} images took {end-start:.2f} s.\n')
preds_keras = np.squeeze(np.argmax(potentials_keras, 1))
accuracy_keras = np.sum(np.equal(preds_keras, labels_test)) / num_images
print(f"Keras accuracy: {accuracy_keras*num_images:.0f}/{num_images}.")
1/1 [==============================] - ETA: 0s
1/1 [==============================] - 1s 1s/step
Keras inference on 10 images took 1.20 s.
Keras accuracy: 9/10.
3. Conversion to Akida
3.1 Convert to Akida model
Here, the Keras quantized model is converted into a suitable version for the Akida accelerator. The cnn2snn.convert function only needs the Keras model as argument.
from cnn2snn import convert
model_akida = convert(model_keras_quantized_pretrained)
The Model.summary method provides a detailed description of the Model layers.
model_akida.summary()
Model Summary
________________________________________________
Input shape Output shape Sequences Layers
================================================
[224, 224, 3] [1, 1, 1000] 1 15
________________________________________________
_____________________________________________________________
Layer (type) Output shape Kernel shape
============== SW/conv_0-classifier (Software) ==============
conv_0 (InputConv.) [112, 112, 16] (3, 3, 3, 16)
_____________________________________________________________
conv_1 (Conv.) [112, 112, 32] (3, 3, 16, 32)
_____________________________________________________________
conv_2 (Conv.) [56, 56, 64] (3, 3, 32, 64)
_____________________________________________________________
conv_3 (Conv.) [56, 56, 64] (3, 3, 64, 64)
_____________________________________________________________
separable_4 (Sep.Conv.) [28, 28, 128] (3, 3, 64, 1)
_____________________________________________________________
(1, 1, 64, 128)
_____________________________________________________________
separable_5 (Sep.Conv.) [28, 28, 128] (3, 3, 128, 1)
_____________________________________________________________
(1, 1, 128, 128)
_____________________________________________________________
separable_6 (Sep.Conv.) [14, 14, 256] (3, 3, 128, 1)
_____________________________________________________________
(1, 1, 128, 256)
_____________________________________________________________
separable_7 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 256)
_____________________________________________________________
separable_8 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 256)
_____________________________________________________________
separable_9 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 256)
_____________________________________________________________
separable_10 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 256)
_____________________________________________________________
separable_11 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 256)
_____________________________________________________________
separable_12 (Sep.Conv.) [7, 7, 512] (3, 3, 256, 1)
_____________________________________________________________
(1, 1, 256, 512)
_____________________________________________________________
separable_13 (Sep.Conv.) [1, 1, 512] (3, 3, 512, 1)
_____________________________________________________________
(1, 1, 512, 512)
_____________________________________________________________
classifier (Fully.) [1, 1, 1000] (1, 1, 512, 1000)
_____________________________________________________________
3.2 Check performance
The following will only compute accuracy for the 10 images set.
# Check Model performance
start = timer()
accuracy_akida = model_akida.evaluate(x_test, labels_test)
end = timer()
print(f'Inference on {num_images} images took {end-start:.2f} s.\n')
print(f"Accuracy: {accuracy_akida*num_images:.0f}/{num_images}.")
# For non-regression purposes
assert accuracy_akida >= 0.8
Inference on 10 images took 0.25 s.
Accuracy: 8/10.
3.3 Show predictions for a random image
Labels for test images are stored in the akida_models package. The matching
between names (string) and labels (integer) is given through the
imagenet.preprocessing.index_to_label
method.
import matplotlib.pyplot as plt
import matplotlib.lines as lines
from akida_models.imagenet import preprocessing
# Functions used to display the top5 results
def get_top5(potentials, true_label):
"""
Returns the top 5 classes from the output potentials
"""
tmp_pots = potentials.copy()
top5 = []
min_val = np.min(tmp_pots)
for ii in range(5):
best = np.argmax(tmp_pots)
top5.append(best)
tmp_pots[best] = min_val
vals = np.zeros((6,))
vals[:5] = potentials[top5]
if true_label not in top5:
vals[5] = potentials[true_label]
else:
vals[5] = 0
vals /= np.max(vals)
class_name = []
for ii in range(5):
class_name.append(preprocessing.index_to_label(top5[ii]).split(',')[0])
if true_label in top5:
class_name.append('')
else:
class_name.append(
preprocessing.index_to_label(true_label).split(',')[0])
return top5, vals, class_name
def adjust_spines(ax, spines):
for loc, spine in ax.spines.items():
if loc in spines:
spine.set_position(('outward', 10)) # outward by 10 points
else:
spine.set_color('none') # don't draw spine
# turn off ticks where there is no spine
if 'left' in spines:
ax.yaxis.set_ticks_position('left')
else:
# no yaxis ticks
ax.yaxis.set_ticks([])
if 'bottom' in spines:
ax.xaxis.set_ticks_position('bottom')
else:
# no xaxis ticks
ax.xaxis.set_ticks([])
def prepare_plots():
fig = plt.figure(figsize=(8, 4))
# Image subplot
ax0 = plt.subplot(1, 3, 1)
imgobj = ax0.imshow(np.zeros((IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS), dtype=np.uint8))
ax0.set_axis_off()
# Top 5 results subplot
ax1 = plt.subplot(1, 2, 2)
bar_positions = (0, 1, 2, 3, 4, 6)
rects = ax1.barh(bar_positions, np.zeros((6,)), align='center', height=0.5)
plt.xlim(-0.2, 1.01)
ax1.set(xlim=(-0.2, 1.15), ylim=(-1.5, 12))
ax1.set_yticks(bar_positions)
ax1.invert_yaxis()
ax1.yaxis.set_ticks_position('left')
ax1.xaxis.set_ticks([])
adjust_spines(ax1, 'left')
ax1.add_line(lines.Line2D((0, 0), (-0.5, 6.5), color=(0.0, 0.0, 0.0)))
# Adjust Plot Positions
ax0.set_position([0.05, 0.055, 0.3, 0.9])
l1, b1, w1, h1 = ax1.get_position().bounds
ax1.set_position([l1 * 1.05, b1 + 0.09 * h1, w1, 0.8 * h1])
# Add title box
plt.figtext(0.5,
0.9,
"Imagenet Classification by Akida",
size=20,
ha="center",
va="center",
bbox=dict(boxstyle="round",
ec=(0.5, 0.5, 0.5),
fc=(0.9, 0.9, 1.0)))
return fig, imgobj, ax1, rects
def update_bars_chart(rects, vals, true_label):
counter = 0
for rect, h in zip(rects, yvals):
rect.set_width(h)
if counter < 5:
if top5[counter] == true_label:
if counter == 0:
rect.set_facecolor((0.0, 1.0, 0.0))
else:
rect.set_facecolor((0.0, 0.5, 0.0))
else:
rect.set_facecolor('gray')
elif counter == 5:
rect.set_facecolor('red')
counter += 1
# Prepare plots
fig, imgobj, ax1, rects = prepare_plots()
# Get a random image
rng = np.random.default_rng()
img = rng.integers(0, num_images)
# Predict image class
outputs_akida = model_akida.predict(np.expand_dims(x_test[img], axis=0)).squeeze()
# Get top 5 prediction labels and associated names
true_label = labels_test[img]
top5, yvals, class_name = get_top5(outputs_akida, true_label)
# Draw Plots
imgobj.set_data(x_test[img])
ax1.set_yticklabels(class_name, rotation='horizontal', size=9)
update_bars_chart(rects, yvals, true_label)
fig.canvas.draw()
plt.show()
4. Hardware mapping and performance
4.1. Map on hardware
List available Akida devices and check that an NSoC V2, Akida 1.0 production chip is available.
If a device is installed but not detected, reinstalling the driver might help, see the driver setup helper.
devices = akida.devices()
print(f'Available devices: {[dev.desc for dev in devices]}')
assert len(devices), "No device found, this example needs an Akida NSoC_v2 device."
device = devices[0]
assert device.version == akida.NSoC_v2, "Wrong device found, this example needs an Akida NSoC_v2."
Available devices: ['PCIe/NSoC_v2/0']
Map the model on the device
model_akida.map(device)
# Check model mapping: NP allocation and binary size
model_akida.summary()
Model Summary
_____________________________________________________
Input shape Output shape Sequences Layers NPs
=====================================================
[224, 224, 3] [1, 1, 1000] 1 15 68
_____________________________________________________
__________________________________________________________________
Layer (type) Output shape Kernel shape NPs
====== HW/conv_0-classifier (Hardware) - size: 1361244 bytes =====
conv_0 (InputConv.) [112, 112, 16] (3, 3, 3, 16) N/A
__________________________________________________________________
conv_1 (Conv.) [112, 112, 32] (3, 3, 16, 32) 4
__________________________________________________________________
conv_2 (Conv.) [56, 56, 64] (3, 3, 32, 64) 6
__________________________________________________________________
conv_3 (Conv.) [56, 56, 64] (3, 3, 64, 64) 3
__________________________________________________________________
separable_4 (Sep.Conv.) [28, 28, 128] (3, 3, 64, 1) 6
__________________________________________________________________
(1, 1, 64, 128)
__________________________________________________________________
separable_5 (Sep.Conv.) [28, 28, 128] (3, 3, 128, 1) 4
__________________________________________________________________
(1, 1, 128, 128)
__________________________________________________________________
separable_6 (Sep.Conv.) [14, 14, 256] (3, 3, 128, 1) 8
__________________________________________________________________
(1, 1, 128, 256)
__________________________________________________________________
separable_7 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1) 4
__________________________________________________________________
(1, 1, 256, 256)
__________________________________________________________________
separable_8 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1) 4
__________________________________________________________________
(1, 1, 256, 256)
__________________________________________________________________
separable_9 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1) 4
__________________________________________________________________
(1, 1, 256, 256)
__________________________________________________________________
separable_10 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1) 4
__________________________________________________________________
(1, 1, 256, 256)
__________________________________________________________________
separable_11 (Sep.Conv.) [14, 14, 256] (3, 3, 256, 1) 4
__________________________________________________________________
(1, 1, 256, 256)
__________________________________________________________________
separable_12 (Sep.Conv.) [7, 7, 512] (3, 3, 256, 1) 8
__________________________________________________________________
(1, 1, 256, 512)
__________________________________________________________________
separable_13 (Sep.Conv.) [1, 1, 512] (3, 3, 512, 1) 8
__________________________________________________________________
(1, 1, 512, 512)
__________________________________________________________________
classifier (Fully.) [1, 1, 1000] (1, 1, 512, 1000) 1
__________________________________________________________________
4.2. Performance measurement
Power measurement must be enabled on the device’ soc (disabled by default). After sending data for inference, performance measurements are available in the model statistics.
# Enable power measurement
device.soc.power_measurement_enabled = True
# Send data for inference
_ = model_akida.forward(x_test)
# Display floor current
floor_power = device.soc.power_meter.floor
print(f'Floor power: {floor_power:.2f} mW')
# Retrieve statistics
print(model_akida.statistics)
Floor power: 898.90 mW
Average framerate = 53.48 fps
Last inference power range (mW): Avg 1041.50 / Min 898.00 / Max 1185.00 / Std 202.94
Last inference energy consumed (mJ/frame): 19.48
Total running time of the script: (0 minutes 8.021 seconds)