Saurav Muralidharan

Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com

 Twitter |  LinkedIn | CV

I am a scientist at NVIDIA Research, working in the Deep Learning Efficiency Research (DLER) team. My work focuses on improving the runtime performance and efficiency of deep neural networks, especially large language models (LLMs), using techniques like model compression (sparsity, low-rank factorization, distillation, etc.) and neural architecture search (NAS).

Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.

Recent Publications

[  Google Scholar |  DBLP ]
Compact Language Models via Pruning and Knowledge Distillation
S. Muralidharan, S. T. Sreenivas, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov
arXiv.   [ pdf | webpage]
Flextron: Many-in-One Flexible Large Language Model
R. Cai, S. Muralidharan, G. Heinrich, H. Yin, Z. Wang, J. Kautz, P. Molchanov
ICML 2024 (Oral).   [ pdf]
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Y. N. Wu, P. Tsai, S. Muralidharan, A. Parashar, V. Sze, J. Emer
arXiv 2305.12718 (2023).  [ pdf]
Uniform Sparsity in Deep Neural Networks
S. Muralidharan
Sixth Conference on Machine Learning and Systems (MLSys 2023).  [ pdf]
Efficient Sparsely Activated Transformers
S. Latifi, S. Muralidharan, M. Garland
arXiv 2208.14580 (2022).  [ pdf]

Professional Service