Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com
I am a scientist at NVIDIA Research, where I work on improving the performance and efficiency of deep neural networks. My areas of interest include neural architecture search (NAS), sparse and dynamic neural networks, and compiler and runtime-based optimization of machine learning models.
Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.
|Efficient Sparsely Activated Transformers|
S. Latifi, S. Muralidharan, M. Garland
arXiv 2208.14580 (2022). [ pdf]
|Going Beyond Classification Accuracy Metrics in Model Compression|
V. Joseph, S. A. Siddiqui, A. Bhaskara, G. Gopalakrishnan, S. Muralidharan, M. Garland, S. Ahmed, A. Dengel
arXiv 2012.01604 (2021). [ pdf]
|A Programmable Approach to Neural Network Compression|
V. Joseph, G. Gopalakrishnan, S. Muralidharan, M. Garland, A. Garg
IEEE Micro Special Issue on Machine Learning for Systems, 2020.
[ code | pdf (arXiv) | talk]
|Designing a Tunable Nested Data-Parallel Programming System
S. Muralidharan, M. Garland, A. Sidelnik, M. Hall,
ACM Transactions on Architecture and Code Optimization (TACO '16). [ pdf]
|Abstractions and Strategies for Adaptive Programming
Ph.D. Dissertation, University of Utah, December 2016. [ pdf]
|Architecture-Adaptive Code Variant Tuning
S. Muralidharan, A. Roy, M. Hall, M. Garland, P. Rai
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16), April 2016, Atlanta, GA.
[ pdf | slides]
|A Collection-Oriented Programming Model for Performance Portability
S. Muralidharan, M. Garland, B. Catanzaro, A. Sidelnik, M. Hall
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15) (short paper), February 2015, San Francisco, CA. [ pdf]
|Nitro: A Framework for Adaptive Code Variant Tuning
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro
IEEE International Parallel & Distributed Processing Symposium (IPDPS '14), May 2014, Phoenix, AZ.
[ code | pdf | slides]
|Towards Making Autotuning Mainstream
P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat
International Journal of High Performance Computing Applications, Volume 27 (IJHPCA '13), November 2013.
Condensa is a framework for programmable model compression in Python. It comes with a set of built-in compression operators which may be used to compose complex compression schemes targeting specific combinations of DNN architecture, hardware platform, and optimization objective. To recover any accuracy lost during compression, Condensa uses a constrained optimization formulation of model compression and employs an Augmented Lagrangian-based algorithm as the optimizer.
Tensor methods generalize matrix algebraic operations to higher-orders, and can help deep neural networks better preserve and leverage local structure. TensorLy-Torch is a PyTorch library that builds on top of TensorLy and provides out-of-the-box tensor layers. It comes with all batteries included and tries to make it as easy as possible to use tensor methods within your deep networks.
|Nitro Autotuning Framework
Nitro is a programmer-directed code variant tuning framework, jointly developed by the University of Utah and NVIDIA Research. It utilizes machine learning-based classification to automatically find the best implementation (variant) of a computation for a given input. Nitro provides C++ and Python interfaces for programmers to specify variants, input dataset features, and constraints.