allenai / OLMoELinks

OLMoE: Open Mixture-of-Experts Language Models

☆916

Alternatives and similar repositories for OLMoE

Users that are interested in OLMoE are comparing it to the libraries listed below

Sorting:

MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,372Updated 4 months ago
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆807Updated last year
NVIDIA-NeMo / Skills
A project to improve skills of large language models
☆628Updated this week
huggingface / search-and-learn
Recipes to scale inference-time compute of open models
☆1,118Updated 6 months ago
zhentingqi / rStar
☆966Updated 10 months ago
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆750Updated last year
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,164Updated 3 months ago
NovaSky-AI / SkyRL
SkyRL: A Modular Full-stack RL Library for LLMs
☆1,287Updated last week
NVIDIA / NeMo-Aligner
Scalable toolkit for efficient model alignment
☆847Updated last month
GAIR-NLP / LIMO
[COLM 2025] LIMO: Less is More for Reasoning
☆1,053Updated 4 months ago
microsoft / MInference
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,163Updated 2 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆482Updated this week
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆456Updated 6 months ago
ByteDance-Seed / Seed-Thinking-v1.5
☆819Updated 5 months ago
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆1,048Updated last week
DreamLM / Dream
Dream 7B, a large diffusion language model
☆1,094Updated last week
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,367Updated 3 months ago
huggingface / Math-Verify
☆1,015Updated 5 months ago
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆357Updated 3 weeks ago
arcee-ai / DistillKit
An Open Source Toolkit For LLM Distillation
☆785Updated 4 months ago
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆793Updated 8 months ago
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,351Updated last week
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆507Updated 9 months ago
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆576Updated last month
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆850Updated last month
mlfoundations / evalchemy
Automatic evals for LLMs
☆559Updated 5 months ago
AIDC-AI / Marco-o1
An Open Large Reasoning Model for Real-World Solutions
☆1,528Updated 6 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
THUDM / slime
slime is an LLM post-training framework for RL Scaling.
☆2,612Updated this week
princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆632Updated last year