project-numina / aimo-progress-prizeLinks
☆465Updated last year
Alternatives and similar repositories for aimo-progress-prize
Users that are interested in aimo-progress-prize are comparing it to the libraries listed below
Sorting:
- Technical report of Kimina-Prover Preview.☆335Updated 3 months ago
- A project to improve skills of large language models☆571Updated this week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆519Updated last week
- Automatic evals for LLMs☆539Updated 3 months ago
- ☆209Updated 6 months ago
- ☆956Updated 3 months ago
- Evaluation of LLMs on latest math competitions☆171Updated 3 weeks ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆328Updated 10 months ago
- ☆1,034Updated 9 months ago
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆413Updated last year
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards☆1,168Updated last week
- ☆163Updated last year
- ☆540Updated last year
- ☆541Updated 10 months ago
- ☆342Updated 4 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆258Updated last year
- Retrieval-Augmented Theorem Provers for Lean☆298Updated 8 months ago
- Reproducible, flexible LLM evaluations☆252Updated 3 months ago
- PyTorch building blocks for the OLMo ecosystem☆305Updated this week
- Large Reasoning Models☆804Updated 10 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆282Updated last year
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆265Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆639Updated 4 months ago
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆321Updated this week
- [NeurlPS D&B 2024] Generative AI for Math: MathPile☆417Updated 6 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆283Updated this week
- Code for Quiet-STaR☆740Updated last year
- A simple unified framework for evaluating LLMs☆250Updated 5 months ago
- ☆962Updated 8 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,005Updated this week