Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com
I am a scientist at NVIDIA Research, working in the Deep Learning Efficiency Research (DLER) team. My work focuses on improving the runtime performance and efficiency of deep neural networks, especially large language models (LLMs), using techniques like model compression (sparsity, low-rank factorization, distillation, etc.) and neural architecture search (NAS).
Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.
Compact Language Models via Pruning and Knowledge Distillation S. Muralidharan, S. T. Sreenivas, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov NeurIPS 2024. [ pdf | webpage] |
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models G. Fang, H. Yin, S. Muralidharan, G. Heinrich, J. Pool, J. Kautz, P. Molchanov, X. Wang NeurIPS 2024 (Spotlight). [ pdf | webpage] |
LLM Pruning and Distillation in Practice: The Minitron Approach S. T. Sreenivas., S. Muralidharan, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov arXiv 2024. [ pdf | webpage] |
Flextron: Many-in-One Flexible Large Language Model R. Cai, S. Muralidharan, G. Heinrich, H. Yin, Z. Wang, J. Kautz, P. Molchanov ICML 2024 (Oral). [ pdf] |
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity Y. N. Wu, P. Tsai, S. Muralidharan, A. Parashar, V. Sze, J. Emer arXiv 2305.12718 (2023). [ pdf] |
Uniform Sparsity in Deep Neural Networks S. Muralidharan Sixth Conference on Machine Learning and Systems (MLSys 2023). [ pdf] |
Efficient Sparsely Activated Transformers S. Latifi, S. Muralidharan, M. Garland arXiv 2208.14580 (2022). [ pdf] |