clevercool / ANT-QuantizationLinks
☆113Updated 2 years ago
Alternatives and similar repositories for ANT-Quantization
Users that are interested in ANT-Quantization are comparing it to the libraries listed below
Sorting:
- ☆35Updated last month
- ☆30Updated 3 months ago
- A co-design architecture on sparse attention☆55Updated 4 years ago
- ☆48Updated 4 years ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆122Updated last year
- ☆223Updated 3 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆24Updated last year
- NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆109Updated last year
- An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences☆31Updated last year
- H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference