project-numina / aimo-progress-prize
☆373Updated 6 months ago
Alternatives and similar repositories for aimo-progress-prize:
Users that are interested in aimo-progress-prize are comparing it to the libraries listed below
- ☆326Updated 5 months ago
- ☆489Updated 2 months ago
- ☆304Updated last week
- ☆136Updated 8 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆177Updated last month
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆450Updated 10 months ago
- ☆65Updated 6 months ago
- ☆999Updated last month
- ☆868Updated last week
- A project to improve skills of large language models☆239Updated this week
- Recipes to scale inference-time compute of open models☆975Updated 2 weeks ago
- OLMoE: Open Mixture-of-Experts Language Models☆539Updated last month
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆154Updated 9 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆292Updated this week
- Large Reasoning Models☆801Updated last month
- RewardBench: the first evaluation tool for reward models.☆494Updated this week
- [NeurlPS D&B 2024] Generative AI for Math: MathPile☆405Updated 3 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆257Updated 8 months ago
- Official repository for ICLR 2025 paper "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient an…☆587Updated last week
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 2 months ago
- Code for Quiet-STaR☆706Updated 5 months ago
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆117Updated 6 months ago
- ☆301Updated 4 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆291Updated last month
- Reproducible, flexible LLM evaluations☆129Updated last month
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆251Updated 2 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 8 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆285Updated 2 months ago
- Family of LLMs for mathematical reasoning.☆245Updated last month