While doing my Integrated Master's, I decided doing a Minor in Machine Learning (ML), I immediately realized that it was be a subject I would very much love to work with during my Master's Thesis. Since I am an Electronics Engineer, and not a Computer Scientist, I wondered how could I match the two subjects together.
As I started doing literature review for my Thesis, I realized that this was actually a very hot topic: efficient hardware implementations of ML algorithms are something that is trending right now. This is because, for applications with very tight timing and performance requirements, a purely software approach does not perform well enough. Furthermore, hardware implementations can provide much better power efficiency, an aspect that is critical for embedded environments.
ML/AI is a very broad subject, and there is a plethora of algorithms to choose from, each one better suited to a given task or problem. Additionally, since it was a Master's Thesis, I was looking for an algorithm where I could improve the state-of-the-art, so I did not want to choose an algorithm that would be too mainstream. I ended up proposing to implement a Recurrent Neural Network (RNN) topology called Long-Short Term Memory (LSTM). A RNN is a Neural Network where the outputs are fed back into the inputs, and are generally used in problems that relate to timeseries data. However, they suffer from a few issues, like Exploding Gradients, or the inability to preserve long-term information within the network structure. LSTMs address both issues.
At the time of publishing, in 2016, my implementation improved the current state-of-the-art, providing the most efficient implementation up to date. The network was running on weights that had been previously trained on a software model. After this main objective was completed, we wondered if we could perform on-chip learning. We chose SPSA, a statistical method for estimating gradients, instead of explicitly calculating them, a technique more suited for hardware computations. Unfortunately, due to time restrictions, we could not develop further this possibility.
If you want the technical details about my work, you have a few possibilities: