KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆47Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Omni-MATH
- Evaluating Mathematical Reasoning Beyond Accuracy☆37Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆61Updated 3 weeks ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆30Updated 3 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆47Updated 2 weeks ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆61Updated last week
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆81Updated 4 months ago
- ☆53Updated 2 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆75Updated last month
- ☆16Updated 3 weeks ago
- ☆28Updated 2 weeks ago
- ☆25Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆113Updated last week
- trending projects & awesome papers about data-centric llm studies.☆32Updated this week
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆111Updated this week
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆68Updated 3 weeks ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆94Updated 6 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- Reformatted Alignment☆112Updated last month
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆66Updated 3 weeks ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆32Updated 10 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆43Updated last week
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆32Updated 3 months ago
- ☆51Updated 7 months ago
- The code and data for the paper JiuZhang3.0☆35Updated 5 months ago
- ☆15Updated last month
- BeHonest: Benchmarking Honesty in Large Language Models☆29Updated 2 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆44Updated 5 months ago