Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

doi:10.48550/arXiv.2412.16537

Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S\&P'24), \SystemName shows a remarkable 54\% to 64\% decrease in runtime and an impressive 72.2\% reduction in communication costs.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.16537

arXiv:

arXiv:2412.16537

Bibcode:

2024arXiv241216537C

Keywords:

Computer Science - Cryptography and Security

E-Print:

14 Pages (with 4 Pages appendix

ADS

Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

Abstract