microsoft / Samba
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
☆831Updated last month
Alternatives and similar repositories for Samba:
Users that are interested in Samba are comparing it to the libraries listed below
- Minimalistic large language model 3D-parallelism training☆1,386Updated this week
- Code for BLT research paper☆1,314Updated this week
- Recipes to scale inference-time compute of open models☆932Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆644Updated this week
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆477Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆531Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆970Updated this week
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆286Updated 8 months ago
- An Open Source Toolkit For LLM Distillation☆425Updated last week
- A bibliography and survey of the papers surrounding o1☆1,042Updated 2 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,481Updated 2 months ago
- System 2 Reasoning Link Collection☆722Updated this week
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,425Updated 10 months ago
- A repository for research on medium sized language models.☆484Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)☆732Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆687Updated 3 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,358Updated 9 months ago
- ☆484Updated last month
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆916Updated 10 months ago
- Code for Quiet-STaR☆698Updated 4 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆277Updated last month
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆541Updated 2 weeks ago
- Large Context Attention☆670Updated 5 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,179Updated 3 months ago
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,099Updated 8 months ago
- Scalable toolkit for efficient model alignment☆674Updated this week
- [NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces in…☆874Updated 2 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆492Updated 2 months ago