Your browser is unsupported

We recommend using the latest version of IE11, Edge, Chrome, Firefox or Safari.

Jan 29 2021

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

ECE 595 Department Seminar Series

January 29, 2021

11:00 AM - 12:00 PM




Chicago, IL 60607

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

Presenter: Yanzhi Wang, Northeastern University

Abstract: Mobile and embedded computing devices have become key carriers of deep learning to facilitate the widespread of machine intelligence. However, there is a widely recognized challenge to achieve real-time DNN inference on edge devices, due to the limited computation/storage resources on such devices. Model compression of DNNs, including weight pruning and weight quantization, has been investigated to overcome this challenge. However, current work on DNN compression suffer from the limitation that accuracy and hardware performance are somewhat conflicting goals difficult to satisfy simultaneously.

We present our recent work CoCoPIE, representing Compression-Compilation Codesign, to overcome this limitation towards the best possible DNN acceleration on edge devices. We propose novel fine-grained structured pruning schemes, including pattern-based pruning, block-based pruning, etc. They can simultaneously achieve high hardware performance (similar to filter/channel pruning) while maintaining zero accuracy loss, with the help of compiler, which is beyond the capability of prior work. Similarly, we present novel quantization scheme that achieves ultra-high hardware performance close to 2-bit weight quantization, with almost no accuracy loss. Through the CoCoPIE framework, we are able to achieve real-time on-device execution of a number of DNN tasks, including object detection, pose estimation, activity detection, speech recognition, just using an off-the-shelf mobile device, with up to 180X speedup compared with prior work. Our comprehensive demonstrations are at :

Speaker bio: Yanzhi Wang is currently an assistant professor at Northeastern University in Boston, MA. He received his PhD from the University of Southern California in 2014, and his BS from Tsinghua University in Beijing, China in 2009. His research interests focus on model compression and platform-specific acceleration of deep learning applications. His research maintains the highest model compression rates on representative DNNs since September 2018. His work on AQFP superconducting based DNN acceleration is by far the highest energy efficiency among all hardware devices. His recent research achievement, CoCoPIE, can achieve real-time performance on almost all deep learning applications using off-the-shelf mobile devices, outperforming competing frameworks by up to 180X acceleration.

His work has been published broadly in top conference and journal venues, including DAC, ICCAD, ASPLOS, ISCA, MICRO, HPCA, PLDI, ICS, PACT, ISSCC, AAAI, ICML, CVPR, ICLR, IJCAI, ECCV, ICDM, ACM MM, FPGA, LCTES, CCS, VLDB, PACT, ICDCS, Infocom, C-ACM, JSSC, TComputer, TCAS-I, TCAD, TCAS-I, JSAC, TNNLS, and has been cited more than 8,200 times. He has received five Best Paper and Top Paper Awards, has ten Best Paper Nominations and four Popular Paper Awards. He has received the U.S. Army Young Investigator Program Award (YIP), Massachusetts Acorn Innovation Award, Ming Hsieh Scholar Award, and other research awards including from Google and MathWorks. Three of his former PhD and postdoctoral students have become tenure-track faculty at the University of Connecticut, Clemson University, and Texas A&M University, Corpus Christi.

Faculty host: Amit Trivedi,

This event will not be recorded


Department of Electrical and Computer Engineering

Date posted

Jan 28, 2021

Date updated

Jan 28, 2021