FPGA Implementation of an LSTM Neural Network

While doing my Integrated Master's, I decided doing a Minor in Machine Learning (ML), I immediately realized that it was be a subject I would very much love to work with during my Master's Thesis. Since I am an Electronics Engineer, and not a Computer Scientist, I wondered how could I match the two subjects together.

As I started doing literature review for my Thesis, I realized that this was actually a very hot topic: efficient hardware implementations of ML algorithms are something that is trending right now. This is because, for applications with very tight timing and performance requirements, a purely software approach does not perform well enough. Furthermore, hardware implementations can provide much better power efficiency, an aspect that is critical for embedded environments.

ML/AI is a very broad subject, and there is a plethora of algorithms to choose from, each one better suited to a given task or problem. Additionally, since it was a Master's Thesis, I was looking for an algorithm where I could improve the state-of-the-art, so I did not want to choose an algorithm that would be too mainstream. I ended up proposing to implement a Recurrent Neural Network (RNN) topology called Long-Short Term Memory (LSTM). A RNN is a Neural Network where the outputs are fed back into the inputs, and are generally used in problems that relate to timeseries data. However, they suffer from a few issues, like Exploding Gradients, or the inability to preserve long-term information within the network structure. LSTMs address both issues.

At the time of publishing, in 2016, my implementation improved the current state-of-the-art, providing the most efficient implementation up to date. The network was running on weights that had been previously trained on a software model. After this main objective was completed, we wondered if we could perform on-chip learning. We chose SPSA, a statistical method for estimating gradients, instead of explicitly calculating them, a technique more suited for hardware computations. Unfortunately, due to time restrictions, we could not develop further this possibility.

If you want the technical details about my work, you have a few possibilities:

A very short paper, which served as a "conclusion report" that was handed to the university. Download it here!
A conference paper, which was published at the 2016 IEEE International Conference on Reconfigurable Computing. You can get it here, on IEEExplore
The full text of the thesis. Download it here!
The Code Repository of the whole project, in GitHub.

José Fonseca, MSc.

FPGA Implementation of an LSTM Neural Network