allenai / FlexOlmoLinks
Code and training scripts for FlexOlmo
☆122Updated last week
Alternatives and similar repositories for FlexOlmo
Users that are interested in FlexOlmo are comparing it to the libraries listed below
Sorting:
- Esoteric Language Models☆109Updated 2 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Updated last year
- ☆63Updated 7 months ago
- PeRL: Parameter-Efficient Reinforcement Learning☆68Updated last week
- Defeating the Training-Inference Mismatch via FP16☆180Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- PostTrainBench measures how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours☆127Updated this week
- ☆35Updated 8 months ago
- ☆82Updated 2 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆214Updated 2 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Updated last month
- Process Reward Models That Think☆77Updated 2 months ago
- ☆98Updated 3 weeks ago
- ☆38Updated 5 months ago
- Official repo of paper LM2☆46Updated 11 months ago
- ☆112Updated last year
- MatFormer repo☆70Updated last year
- [TMLR 2026] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models☆121Updated 11 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Updated last week
- EvaByte: Efficient Byte-level Language Models at Scale☆115Updated 9 months ago
- ☆91Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Updated 9 months ago
- The HELMET Benchmark☆198Updated last month
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆182Updated 7 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- Official code release for "SuperBPE: Space Travel for Language Models"☆86Updated 3 weeks ago
- ☆85Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago