Urmish is a Deep Learning Researcher at SambaNova Systems. Before joining SambaNova, he worked with Arm Research, AMD, Texas Instruments and Broadcom. His research has primarily focused on efficient execution of neural networks on resource constrained devices. Specifically, he has worked on model quantization, pruning, structured matrices and low-rank decomposition. His work has led to patents, publications and contributions to various products across multiple companies.
Urmish completed his Master’s in Computer Science from UW Madison in US and Bachelor’s from BITS Pilani in India.
This talk gives an overview of my work in exploring Kronecker Products (KP) to compress sequence based neural networks. The talk is divided into two parts. In the first part we show that KP can compress IoT LSTM Applications by 15-38x compression factors, achieving better results than traditional compression methods. This talk covers a quick tutorial on KP and the best methodology for using KP to compress IoT workloads. However, when KP is applied to large Natural Language Processing tasks, it leads to significant accuracy loss. The second part of the talk addresses this issue. We show a way to recover accuracy otherwise lost when applying KP compression to large NLP tasks using a novel technique that we call doping. Doping is a process of adding an extremely sparse overlay matrix on top of the pre-defined KP structure. We call the resultant compression method Doped Kronecker Product (DKP). We present experimental results that demonstrate compression of a large language model with LSTM layers of size 25 MB by 25x with 1.4% loss in perplexity score using DKP and show that it outperforms other traditional compression techniques.