NVlabs / hymbaView external linksLinks
☆208Dec 11, 2024Updated last year
Alternatives and similar repositories for hymba
Users that are interested in hymba are comparing it to the libraries listed below
Sorting:
- ☆12Nov 13, 2024Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 7 months ago
- ☆13Dec 15, 2025Updated last month
- PyTorch implementation of models from the Zamba2 series.☆186Jan 23, 2025Updated last year
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- Some preliminary explorations of Mamba's context scaling.☆218Feb 8, 2024Updated 2 years ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆236Oct 14, 2025Updated 4 months ago
- ☆63Oct 3, 2024Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆452Sep 15, 2025Updated 4 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆366Nov 6, 2025Updated 3 months ago
- train with kittens!☆63Oct 25, 2024Updated last year
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- Alpha-Zero Connect Four NN trained via self play☆25Mar 7, 2025Updated 11 months ago
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆16Sep 13, 2024Updated last year
- ☆16Nov 28, 2024Updated last year
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆21Jan 8, 2025Updated last year
- ☆51Jan 28, 2024Updated 2 years ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28May 4, 2025Updated 9 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆357Feb 5, 2026Updated last week
- Code for BLT research paper☆2,028Nov 3, 2025Updated 3 months ago
- ☆108Mar 12, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆89Oct 30, 2024Updated last year
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆944Nov 16, 2025Updated 2 months ago
- ☆219Jan 23, 2025Updated last year
- ☆44Nov 1, 2025Updated 3 months ago
- Implementation of BitNet-1.58 instruct tuning☆27Apr 14, 2024Updated last year
- H-Net: Hierarchical Network with Dynamic Chunking☆812Nov 20, 2025Updated 2 months ago
- LLM KV cache compression made easy☆876Jan 28, 2026Updated 2 weeks ago
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆587Feb 11, 2025Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆169Jan 30, 2025Updated last year
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,379Updated this week
- ☆158Feb 15, 2025Updated 11 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Jan 12, 2026Updated last month
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- RWKV-7: Surpassing GPT☆104Nov 17, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- Helpful tools and examples for working with flex-attention☆1,127Updated this week
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- Annotated version of the Mamba paper☆496Feb 27, 2024Updated last year