Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com
I am a scientist at NVIDIA Research, working in the Deep Learning Efficiency Research (DLER) team. My work focuses on improving the runtime performance and efficiency of deep neural networks, especially large language models (LLMs), using techniques like model compression (sparsity, low-rank factorization, distillation, etc.) and neural architecture search (NAS).
Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.
Compact Language Models via Pruning and Knowledge Distillation S. Muralidharan, S. T. Sreenivas, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov arXiv. [ pdf | webpage] |
Flextron: Many-in-One Flexible Large Language Model R. Cai, S. Muralidharan, G. Heinrich, H. Yin, Z. Wang, J. Kautz, P. Molchanov ICML 2024 (Oral). [ pdf] |
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity Y. N. Wu, P. Tsai, S. Muralidharan, A. Parashar, V. Sze, J. Emer arXiv 2305.12718 (2023). [ pdf] |
Uniform Sparsity in Deep Neural Networks S. Muralidharan Sixth Conference on Machine Learning and Systems (MLSys 2023). [ pdf] |
Efficient Sparsely Activated Transformers S. Latifi, S. Muralidharan, M. Garland arXiv 2208.14580 (2022). [ pdf] |