Hybrid CMOS-NVM Integrated Circuits for Energy-efficient Neuromorphic Computing and Edge-AI

Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The rapid growth of data generation puts a strain on computational and networking resources required for data routing, storage, and processing. The high performance of Machine Learning (ML) and AI algorithms made them a staple choice for solving such problems. Unfortunately, high performance comes with a significant energy cost. Traditional Von-Neumann architecture makes memory access energy-demanding and ML/AI algorithms shuttle many data in and out of memory. Neuromorphic computing promises to solve memory energy bottlenecks by performing computations within memory. Neuromorphic computing architecture relies on several core elements: devices, arrays, circuits, and algorithms. Literature review shows that a significant amount of work has been done researching devices and algorithms. However, more work needs to be done researching arrays/circuits to bridge the gap between them. % The primary purpose of this research work is to facilitate the neuromorphic system design on both hardware and software levels. Customized PyTorch and TensorFlow-based frameworks are developed to simulate and analyze the performance of neuromorphic hardware (with a focus on NVM devices and CMOS circuits) from a system-level neural network perspective. Metrics of interest include hardware architecture, the interplay of \textbf{nonvolatile memory (NVM)} devices and mixed-signal circuits, energy consumption, NVM requirements (multilevel performance, endurance, retention, energy consumption, density, CMOS compatibility, etc.), and data converter performance (resolution, latency, energy), and overall system-level latency and classification accuracy. In addition, we observed that the proposed SNN training frameworks are compatible with native automatic differentiation, found in both PyTorch and TensorFlow. This opens up the whole avenue of using back-propagation-based optimization for SNNs without additional steps, such as surrogate gradients. Moreover, we developed two NVM array readout circuits that effectively function as neurons. The first implementation enables the functionality of a Spiking Convolutional Neural Network by enabling spiking MaxPool. The second variation utilizes Current Controlled Ring Oscillators to implement the integrator functionality of an Integrate \& Fire neuron. Such design choice allows for the elimination of the use of opamps or dedicated capacitors, making the proposed design friendly for implementation in advanced CMOS processes. Finally, we verified that the proposed approach of SNN architecture design allowed us to achieve a competitive speech-denoising performance while maintaining a small model size using a compact 3D SCNN architecture.
Description
Keywords
Citation