project-numina / aimo-progress-prize
☆423Updated 9 months ago
Alternatives and similar repositories for aimo-progress-prize:
Users that are interested in aimo-progress-prize are comparing it to the libraries listed below
- Technical report of Kimina-Prover Preview.☆231Updated last week
- ☆647Updated 3 weeks ago
- ☆519Updated last week
- Automatic evals for LLMs☆373Updated this week
- Large Reasoning Models☆802Updated 4 months ago
- LIMO: Less is More for Reasoning☆920Updated 2 weeks ago
- ☆326Updated 2 months ago
- ☆1,015Updated 4 months ago
- ☆922Updated 3 months ago
- A project to improve skills of large language models☆295Updated this week
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆304Updated 5 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆325Updated this week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆203Updated 11 months ago
- ☆175Updated 3 weeks ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆882Updated last week
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆681Updated last month
- Verifiers for LLM Reinforcement Learning☆827Updated 3 weeks ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆264Updated 10 months ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆873Updated 2 months ago
- A bibliography and survey of the papers surrounding o1☆1,187Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆555Updated last month
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆509Updated last month
- ☆148Updated 11 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆317Updated 4 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆685Updated last week
- ☆515Updated 5 months ago
- ☆75Updated 9 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆459Updated last year
- ☆493Updated 8 months ago
- ☆283Updated last month