Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation
Abstract
Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S\&P'24), \SystemName shows a remarkable 54\% to 64\% decrease in runtime and an impressive 72.2\% reduction in communication costs.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.16537
- Bibcode:
- 2024arXiv241216537C
- Keywords:
-
- Computer Science - Cryptography and Security
- E-Print:
- 14 Pages (with 4 Pages appendix