An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.
☆26Apr 15, 2025Updated 10 months ago
Alternatives and similar repositories for BiTA
Users that are interested in BiTA are comparing it to the libraries listed below
Sorting:
- Cascade Speculative Drafting☆33Apr 2, 2024Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆115Mar 20, 2025Updated 11 months ago
- Efficient LLM Inference Acceleration using Prompting☆51Oct 22, 2024Updated last year
- ☆16Dec 21, 2023Updated 2 years ago
- ☆13May 11, 2023Updated 2 years ago
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 8 months ago
- ☆11Sep 20, 2024Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- Source code for a LoRA-based continual relation extraction method.☆14Sep 25, 2023Updated 2 years ago
- Fork of Flame repo for training of some new stuff in development☆19Updated this week
- ☆11Feb 5, 2026Updated last month
- Multi-Candidate Speculative Decoding☆39Apr 22, 2024Updated last year
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- Implementation of "Decoding-time Realignment of Language Models", ICML 2024.☆21Jun 17, 2024Updated last year
- scalable and robust tree-based speculative decoding algorithm☆370Jan 28, 2025Updated last year
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆277Aug 31, 2024Updated last year
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation☆25Feb 18, 2025Updated last year
- ☆26Nov 23, 2023Updated 2 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- ☆66Nov 4, 2024Updated last year
- ☆30Jul 22, 2024Updated last year
- Dataset with coverage annotations for HumanEval dataset☆24Aug 17, 2023Updated 2 years ago
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆66Jun 26, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆216Feb 13, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆143Dec 4, 2024Updated last year
- The Multitask Long Document Benchmark☆42Nov 2, 2022Updated 3 years ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Nov 29, 2023Updated 2 years ago
- ☆221Jan 23, 2025Updated last year
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,201Feb 20, 2026Updated last week
- Code for "APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training"☆38Dec 23, 2025Updated 2 months ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- Cross-platform Python client for the CodeReef.ai portal to manage portable workflows, reusable automation actions, software detection plu…☆11Mar 27, 2020Updated 5 years ago
- homework in SCUT_SE☆12Nov 9, 2021Updated 4 years ago