microsoft / SambaLinks
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
β876Updated last month
Alternatives and similar repositories for Samba
Users that are interested in Samba are comparing it to the libraries listed below
Sorting:
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ559Updated 3 months ago
- Minimalistic large language model 3D-parallelism trainingβ1,898Updated this week
- Muon optimizer: +>30% sample efficiency with <3% wallclock overheadβ661Updated this week
- Open weights language model from Google DeepMind, based on Griffin.β639Updated last week
- Mamba-Chat: A chat LLM based on the state-space model architecture πβ922Updated last year
- Code for BLT research paperβ1,664Updated last week
- A repository for research on medium sized language models.β495Updated 3 weeks ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ513Updated 2 weeks ago
- Pretraining code for a large-scale depth-recurrent language modelβ770Updated this week
- Recipes to scale inference-time compute of open modelsβ1,087Updated last week
- Annotated version of the Mamba paperβ482Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β333Updated 5 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Modelsβ1,534Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)β814Updated last week
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β550Updated 5 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,563Updated last week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β725Updated 8 months ago
- A bibliography and survey of the papers surrounding o1β1,193Updated 6 months ago
- A family of compressed models obtained via pruning and knowledge distillationβ341Updated 6 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projectionβ1,560Updated 7 months ago
- β1,024Updated 5 months ago
- System 2 Reasoning Link Collectionβ834Updated 2 months ago
- Large Context Attentionβ711Updated 4 months ago
- β517Updated 6 months ago
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,066Updated 4 months ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ1,505Updated 2 months ago
- Train Models Contrastively in Pytorchβ713Updated 2 months ago
- Code for Quiet-STaRβ732Updated 9 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAIβ1,384Updated last year
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,489Updated last year