Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory
Abstract
Lattice-based cryptographic algorithms built on ring learning with error theory are gaining importance due to their potential for providing post-quantum security. However, these algorithms involve complex polynomial operations, such as polynomial modular multiplication (PMM), which is the most time-consuming part of these algorithms. Accelerating PMM is crucial to make lattice-based cryptographic algorithms widely adopted by more applications. This work introduces a novel high-throughput and compact PMM accelerator, X-Poly, based on the crossbar (XB)-type compute-in-memory (CIM). We identify the most appropriate PMM algorithm for XB-CIM. We then propose a novel bit-mapping technique to reduce the area and energy of the XB-CIM fabric, and conduct processing engine (PE)-level optimization to increase memory utilization and support different problem sizes with a fixed number of XB arrays. X-Poly design achieves 3.1X10^6 PMM operations/s throughput and offers 200X latency improvement compared to the CPU-based implementation. It also achieves 3.9X throughput per area improvement compared with the state-of-the-art CIM accelerators.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- arXiv:
- arXiv:2307.14557
- Bibcode:
- 2023arXiv230714557L
- Keywords:
-
- Computer Science - Cryptography and Security;
- Computer Science - Hardware Architecture
- E-Print:
- Accepted by 42nd International Conference on Computer-Aided Design (ICCAD)