yuzhenmao / IceFormerView external linksLinks
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Jul 15, 2025Updated 6 months ago
Alternatives and similar repositories for IceFormer
Users that are interested in IceFormer are comparing it to the libraries listed below
Sorting:
- ☆13Jan 7, 2025Updated last year
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- ☆16Mar 13, 2023Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated last year
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 6 months ago
- ☆71Mar 26, 2025Updated 10 months ago
- Loop Nest - Linear algebra compiler and code generator.☆21Oct 22, 2022Updated 3 years ago
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆19May 31, 2025Updated 8 months ago
- Running inference on the ZeroSCROLLS benchmark☆20Apr 18, 2024Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Apr 7, 2024Updated last year
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆55Mar 27, 2024Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- An open source time series library for Python implementing Matrix Profile☆23Jul 31, 2018Updated 7 years ago
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆53Nov 5, 2024Updated last year
- ☆303Jul 10, 2025Updated 7 months ago
- Repo for "Smart Word Suggestions" (SWS) task and benchmark☆20Dec 4, 2023Updated 2 years ago
- ☆21Mar 22, 2021Updated 4 years ago
- KDD21 Deep Learning Embeddings for Data Series Similarity Search☆20Aug 5, 2021Updated 4 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆114Sep 10, 2024Updated last year
- ☆30Jul 22, 2024Updated last year
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- Contrastive Chain-of-Thought Prompting☆68Nov 18, 2023Updated 2 years ago
- ☆34Feb 3, 2025Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Jul 4, 2025Updated 7 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆404Aug 13, 2024Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆32May 26, 2024Updated last year
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆82Dec 7, 2025Updated 2 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆78Aug 12, 2024Updated last year
- ☆35Jun 15, 2023Updated 2 years ago
- ☆85Apr 18, 2025Updated 9 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Jun 11, 2025Updated 8 months ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- 分层解耦的深度学习推理引擎☆79Feb 17, 2025Updated 11 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆147Sep 20, 2024Updated last year