# PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

> Research article (Findings of the Association for Computational Linguistics ACL 2024, 2024) · cited 14× · AI/ML

**Wikidata**: [openalex:W4402670433](https://www.wikidata.org/wiki/openalex:W4402670433)  
**Source**: https://4ort.xyz/entity/pyramidinfer-pyramid-kv-cache-compression-for-high-throughput-llm-inference