Starting Date: June 2025
Prerequisites: Having good knowledge of Python programming is essential. Familiarity with Deep Learning and the PyTorch/JAX framework is also important.
Will results be assigned to University: No
Model distillation for large language models (LLMs) presents a key challenge in AI research: how to compress massive, computationally expensive models into smaller, more efficient versions while preserving their performance. Large models, such as GPT-4, require vast amounts of memory and processing power, making them impractical for real-time applications on edge devices or personal computers. Distillation techniques aim to transfer knowledge from a large “teacher” model to a smaller “student” model.
Recent revolutionary models such as DeepSeek leverage distillation [1] to create models at significantly lower cost.
This project will explore innovative approaches to distillation in the context of recurrent language models [2], including novel loss functions, offering hands-on experience with cutting-edge AI compression techniques. If you’re excited about optimizing AI for real-world applications, this is your chance to dive in!
Reading:
- DeepSeek-AI et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Preprint at https://doi.org/10.48550/arXiv.2501.12948 (2025).
- Wang, J., Paliotta, D., May, A., Rush, A. M. & Dao, T. The Mamba in the Llama: Distilling and Accelerating Hybrid Models. Preprint at https://doi.org/10.48550/arXiv.2408.15237 (2025).