microsoft / SambaLinks

[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

☆917

Alternatives and similar repositories for Samba

Users that are interested in Samba are comparing it to the libraries listed below

Sorting:

seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆838Updated 2 weeks ago
redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
☆933Updated last year
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆576Updated 8 months ago
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆884Updated last week
allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆888Updated last month
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,274Updated last month
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆292Updated last year
facebookresearch / blt
Code for BLT research paper
☆1,999Updated 5 months ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆560Updated 10 months ago
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,518Updated 8 months ago
arcee-ai / DistillKit
An Open Source Toolkit For LLM Distillation
☆744Updated 3 months ago
huggingface / search-and-learn
Recipes to scale inference-time compute of open models
☆1,114Updated 5 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆347Updated 10 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆344Updated 5 months ago
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,612Updated last year
google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆653Updated 4 months ago
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆354Updated 11 months ago
mlfoundations / open_lm
A repository for research on medium sized language models.
☆515Updated 4 months ago
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,313Updated 2 months ago
nomic-ai / contrastors
Train Models Contrastively in Pytorch
☆753Updated 7 months ago
facebookresearch / SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
☆828Updated 3 weeks ago
XueFuzhao / OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,620Updated last year
xfactlab / orpo
Official repository for ORPO
☆464Updated last year
SakanaAI / self-adaptive-llms
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
☆1,159Updated 9 months ago
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆820Updated 3 months ago
huggingface / cosmopedia
☆546Updated 11 months ago
NVlabs / hymba
☆201Updated 10 months ago
trotsky1997 / MathBlackBox
☆1,035Updated 10 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆440Updated 5 months ago