allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆704Updated last month
Alternatives and similar repositories for OLMoE:
Users that are interested in OLMoE are comparing it to the libraries listed below
- Large Reasoning Models☆800Updated 4 months ago
- ☆921Updated 2 months ago
- Muon is Scalable for LLM Training☆1,020Updated 2 weeks ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆845Updated this week
- ☆513Updated this week
- Recipes to scale inference-time compute of open models☆1,051Updated last month
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆715Updated 6 months ago
- [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which r…☆969Updated last week
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆354Updated 7 months ago
- ☆617Updated 2 weeks ago
- LIMO: Less is More for Reasoning☆905Updated last week
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆673Updated 3 weeks ago
- A family of compressed models obtained via pruning and knowledge distillation☆333Updated 5 months ago
- A series of technical report on Slow Thinking with LLM☆624Updated last week
- ☆452Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,414Updated this week
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆495Updated 3 weeks ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆609Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆314Updated 4 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆447Updated 2 months ago
- Pretraining code for a large-scale depth-recurrent language model☆734Updated this week
- An Open Large Reasoning Model for Real-World Solutions☆1,482Updated last month
- [ICML 2024] CLLMs: Consistency Large Language Models☆390Updated 4 months ago
- Automatic evals for LLMs☆361Updated this week
- ☆1,014Updated 3 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆402Updated 5 months ago
- FuseAI Project☆562Updated 2 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆457Updated last year
- ☆509Updated 4 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆600Updated last year