Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆160Updated this week
Alternatives and similar repositories for PostTrainBench
Users that are interested in PostTrainBench are comparing it to the libraries listed below
Sorting:
- Extract streaming data from text using prefix completion.☆10Oct 6, 2024Updated last year
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- ☆70Feb 9, 2026Updated 3 weeks ago
- ☆12Dec 23, 2022Updated 3 years ago
- Pytorch implementation of the Gato paper from Deepmind☆12Feb 8, 2023Updated 3 years ago
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- Friday Agents. App: https://chat.toolstack.run/☆14Dec 18, 2024Updated last year
- ☆24Jun 18, 2025Updated 8 months ago
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆42Apr 22, 2025Updated 10 months ago
- Training tiny models to prove hard theorems☆29Feb 15, 2026Updated 2 weeks ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Feb 5, 2024Updated 2 years ago
- MLBench Framework Core Python Library☆18Mar 1, 2023Updated 3 years ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 9 months ago
- Evaluation kit for testing stateful agents☆52Updated this week
- ☆43Sep 19, 2024Updated last year
- ☆25Dec 13, 2024Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- Fast and memory-efficient exact attention☆29Dec 2, 2024Updated last year
- Clue inspired puzzles for testing LLM deduction abilities☆45Mar 24, 2025Updated 11 months ago
- ☆24Apr 3, 2025Updated 10 months ago
- ☆104Dec 5, 2025Updated 2 months ago
- NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.☆47Feb 12, 2026Updated 2 weeks ago
- ☆19Mar 3, 2025Updated 11 months ago
- When Reasoning Meets Its Laws☆35Jan 2, 2026Updated 2 months ago
- RAG Agent for the ARC AGI Challenge☆20Jul 1, 2024Updated last year
- Code for our ICRA 2024 paper on learning diverse skills☆25Apr 6, 2024Updated last year
- ☆352Jul 29, 2025Updated 7 months ago
- A package with all scripts and commands needed to record joint and ee trajectories (and more) from mutliple robots for kinesthetic teachi…☆26May 17, 2022Updated 3 years ago
- DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval☆45Jan 28, 2026Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆51Nov 9, 2024Updated last year
- ☆23Jul 5, 2024Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆36Nov 27, 2025Updated 3 months ago
- Train your own SOTA deductive reasoning model☆107Mar 6, 2025Updated 11 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Feb 11, 2026Updated 2 weeks ago
- Ludic – an LLM-RL library for the era of experience☆60Jan 9, 2026Updated last month
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Mar 28, 2024Updated last year
- speed-running solving robot manipulation tasks☆24Oct 31, 2024Updated last year
- Write a fast kernel and see how you compare against the best humans and AI on gpumode.com☆77Updated this week
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆28Jul 9, 2025Updated 7 months ago