LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

doi:10.48550/arXiv.2401.16603

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

This paper describes LeftoverLocals: a vulnerability that allows data recovery from GPU memory created by another process on Apple, Qualcomm, and AMD GPUs. LeftoverLocals impacts the security posture of GPU applications, with particular significance to LLMs and ML models that run on impacted GPUs. By recovering local memory, an optimized GPU memory region, we built a PoC where an attacker can listen into another user's interactive LLM session (e.g., llama.cpp) across process or container boundaries.

Publication:

arXiv e-prints

Pub Date:

January 2024

DOI:

10.48550/arXiv.2401.16603

arXiv:

arXiv:2401.16603

Bibcode:

2024arXiv240116603S

Keywords:

Computer Science - Cryptography and Security;
Computer Science - Distributed;
Parallel;
and Cluster Computing

NASA/ADS

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Abstract