Exploring optimization algorithms for deep sequence models (available)

Starting Date: June 2025
Prerequisites: Having good knowledge of Python programming is essential. Familiarity with Deep Learning and some deep learning framework (such as PyTorch/JAX/Tensorflow) is important.
Will results be assigned to University: No

Deep sequence models such as recurrent neural networks (RNNs) are a key type of architecture in modern deep learning, particularly for processing sequential data such as language text, speech, video, and time series data. RNNs have loops that allow information to persist and be passed from one step to the next, enabling them to effectively model patterns and dependencies in sequences. This makes RNNs effective for tasks like language modelling, machine translation, speech recognition, and time series forecasting. Some of the frontier models are RNNs [1], including Google’s recurrent gemma [2].

In this project, the goal will be to study optimization algorithms for recurrent neural networks to improve their performance and parallelize their training. Specifically, we will explore optimization algorithms such as second-order gradient methods [3] or biologically inspired training methods [4] will be applied to modern RNNs. Alternatively, we will explore methods of parallelizing the training of RNNs [5, 6] to allow them to scale up.

Students are welcome to email (anand.subramoney@rhul.ac.uk) for informal discussions.

Reading:

  1. Gu, A., Dao, T., 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. https://doi.org/10.48550/arXiv.2312.00752
  2. De, S. et al. Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models. Preprint at https://doi.org/10.48550/arXiv.2402.19427 (2024).
  3. Anil, R., Gupta, V., Koren, T., Regan, K., Singer, Y., 2021. Scalable Second Order Optimization for Deep Learning. https://doi.org/10.48550/arXiv.2002.09018
  4. Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R., Maass, W., 2020. A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications 11, 3625. https://doi.org/10.1038/s41467-020-17236-y
  5. Gonzalez, X., Warrington, A., Smith, J. T. H. & Linderman, S. W. Towards Scalable and Stable Parallelization of Nonlinear RNNs. Preprint at https://doi.org/10.48550/arXiv.2407.19115 (2024).
  6. Blelloch, G. E. Prefix Sums and Their Applications.