Project Motivation and Goals
The k-Nearest-Neighbours (kNN) algorithm is a popular Machine Learning technique that can be used for a variety of supervised classification tasks. In contrast to other machine learning algorithms which "encode" the knowledge gained from training data to a set of parameters, such as weights and biases, the parameter set of a kNN classifier consists of just labelled training examples. Classification of an unlabelled example takes place by calculating its Euclidean distance (or any other type of distance metric) from all the stored training examples. The label of the unknown example is determined by the majority of the labels of the k closest examples.
In contrast to other Machine Learning methods, kNN classification doesn't require an explicit training phase, which makes it a potential candidate for learning directly from the data captured by an embedded system. However, the computation of the distance metrics between a new unclassified example and the labelled examples stored as the classifier's parameter set acts as a bottleneck in the classification process, if it is implemented in software. Through the development of dedicated hardware for calculating the required distance metrics, it is possible to reduce inference time.
The goal of this project is to build a hardware implementation of a kNN classifier that can achieve reduced inference time when compared to a kNN classifier implemented fully in software, through the parallelisation of distance metric computations. The project has an educational purpose and it expected that by its completion, the contributors will have gone through the complete design cycle of a SoC.
The first step in defining the design specification for the accelerator was to determine the type of data that need to be processed by it. The Fashion-MNIST dataset has been selected as the primary data source. Therefore, the accelerator should be able to handle grayscale image vectors with a size of 28 x 28 pixels. We assume that some of these images will be labelled (training set), while some of them won't (test set). The accelerator should help with computing the distance between all the labelled images and each unknown image in the unlabelled set.
Data to be processed: N 28 x 28 8-bit (grayscale) labelled image vectors and M unlabelled image vectors, stored in SRAM.
Image vectors a(i), b(j) have D elements, where D = 28 x 28 = 784. Each element is represented as an 8-bit unsigned integer value. If N = 200 and M = 50, the total storage requirements for matrices A and B are:
N x 785 + M x 784 = 200 x 785 + 50 x 784 = 196,200 bytes, which is approximately equal to 192 kB. It is assumed that one byte of storage is required for storing every label. This is the reason why the number of labelled examples N is multiplied by 785 and not 784.
We would like to compute the Squared Euclidean distance between each pair of vectors a(i), b(j). This computation is given by the sum of 3 dot products, as shown below:
The self-dot products a(i)⋅a(i) and b(j)⋅b(j) can only be computed once and then reused for another computation.
Because of the high dimensionality of the image vectors, it is proposed to divide the computation of the full dot products into a series of rounds. In each round, partial dot products are computed over a small number of elements. The partial dot products from each round, can be cached into an accumulator and reused in the next round to incrementally calculate the final full dot product. In the last round, the full-dot products are moved to SRAM.
In the case, where the full self-dot products are known, the modules of the accelerator that compute them should be turned off through configuration bits.
Accelerator Input Data Packet
Each round of computation can take place on 4 vector elements from each image vector. With 4 vector elements per round, the complete computation is expected to be completed after 784 / 4 = 196 rounds. Each input packet for the accelerator wrapper is built from 32-bit words. Therefore, there is a need for (2 x 4 x 8) / 32 = 2 words per data packet.
First abstract description of project goalsTarget Date
Rough outline of the project's goals, hardware accelerator idea and motivation.
Architectural description and Familirisation with design toolsTarget Date
- Specify the number format used for the arithmetic operations of the accelerator (i.e. 8-bit unsigned integers for grayscale images).
- Select the distance metric used for kNN classification (i.e. Euclidean distances).
- Determine the type of building blocks (i.e. adders, multipliers) required to perform the arithmetic operations.
- Capture the architectural design in block diagram format, specifying the necessary inputs, outputs, registers and the overall data flow.
- Complete the lab exercises from the "Introduction to SoC design" course to gain familiarity with Xilinx Vivado.
Milestone #3Target Date
Milestone #4Target Date
Milestone #5Target Date
Milestone #6Target Date