Saurav Muralidharan

Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com

 Twitter |  LinkedIn

I am a scientist at NVIDIA Research, working in the Deep Learning Efficiency Research (DLER) team. My work focuses on improving the runtime performance and efficiency of deep neural networks, especially large language models (LLMs), using techniques like model compression (sparsity, low-rank factorization, distillation, etc.) and neural architecture search (NAS).

Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.

Selected Publications

[  Google Scholar |  DBLP ]
Compact Language Models via Pruning and Knowledge Distillation
S. Muralidharan, S. T. Sreenivas, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov
NeurIPS 2024.   [ pdf |  webpage]
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
G. Fang, H. Yin, S. Muralidharan, G. Heinrich, J. Pool, J. Kautz, P. Molchanov, X. Wang
NeurIPS 2024 (Spotlight).   [ pdf |  webpage]
LLM Pruning and Distillation in Practice: The Minitron Approach
S. T. Sreenivas., S. Muralidharan, R. Joshi, M. Chochowski, M. Patwary, M. Shoeybi, B. Catanzaro, J. Kautz, P. Molchanov
arXiv 2024.   [ pdf |  webpage]
Flextron: Many-in-One Flexible Large Language Model
R. Cai, S. Muralidharan, G. Heinrich, H. Yin, Z. Wang, J. Kautz, P. Molchanov
ICML 2024 (Oral).   [ pdf]
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Y. N. Wu, P. Tsai, S. Muralidharan, A. Parashar, V. Sze, J. Emer
arXiv 2305.12718 (2023).  [ pdf]
Uniform Sparsity in Deep Neural Networks
S. Muralidharan
MLSys 2023.  [ pdf]
Efficient Sparsely Activated Transformers
S. Latifi, S. Muralidharan, M. Garland
arXiv 2208.14580 (2022).  [ pdf]

Mentorship

NVIDIA Research Interns