OpenBMB / OlympiadBench
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.
☆92Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for OlympiadBench
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆82Updated 4 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆75Updated last month
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆96Updated 6 months ago
- ☆68Updated 4 months ago
- ☆98Updated 5 months ago
- The official repository of the Omni-MATH benchmark.☆47Updated last week
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆84Updated 4 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆216Updated 2 months ago
- ☆241Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆143Updated 4 months ago
- Evaluating Mathematical Reasoning Beyond Accuracy☆37Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- ☆51Updated 7 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆189Updated 3 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆90Updated this week
- ☆24Updated 2 weeks ago
- ☆113Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆113Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆61Updated 3 weeks ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆111Updated this week
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆166Updated last month
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆115Updated 4 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆67Updated last month
- MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems☆70Updated 3 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆141Updated 4 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 4 months ago
- Reformatted Alignment☆112Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆72Updated 9 months ago