itsdaniele / speculative_mambaLinks
☆15Updated last year
Alternatives and similar repositories for speculative_mamba
Users that are interested in speculative_mamba are comparing it to the libraries listed below
Sorting:
- ☆18Updated last year
- ☆11Updated 11 months ago
- KV cache compression via sparse coding☆14Updated last month
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆15Updated 4 months ago
- Fast and memory-efficient exact attention☆74Updated 8 months ago
- ☆43Updated last week
- Experiments on Multi-Head Latent Attention☆99Updated last year
- ☆110Updated last week
- The evaluation framework for training-free sparse attention in LLMs☆104Updated last month
- Transformers components but in Triton☆34Updated 6 months ago
- ☆48Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Updated 4 months ago
- Normalized Transformer (nGPT)☆194Updated last year
- Work in progress.☆75Updated 5 months ago
- Code for studying the super weight in LLM☆121Updated 11 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆140Updated 2 weeks ago
- ☆132Updated 6 months ago
- The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]☆60Updated 5 months ago
- ☆66Updated last week
- ☆151Updated 9 months ago
- ☆83Updated 10 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆126Updated 5 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆150Updated 4 months ago
- Official implementation for Training LLMs with MXFP4☆109Updated 7 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆206Updated 5 months ago
- ☆10Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Updated 9 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆54Updated last year
- Awesome Triton Resources☆36Updated 7 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆231Updated last month