zankner / Hydra
☆40Updated 10 months ago
Alternatives and similar repositories for Hydra:
Users that are interested in Hydra are comparing it to the libraries listed below
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆47Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆192Updated last month
- ☆107Updated 3 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆204Updated 4 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆152Updated 6 months ago
- Explorations into some recent techniques surrounding speculative decoding☆229Updated 3 weeks ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆56Updated 3 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆85Updated 10 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆88Updated 11 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆90Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity