ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจ
โ208Updated last year
Alternatives and similar repositories for math-evaluation-harness:
Users that are interested in math-evaluation-harness are comparing it to the libraries listed below
- โ192Updated 2 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ122Updated 9 months ago
- โ150Updated 4 months ago
- โ287Updated last month
- Repo of paper "Free Process Rewards without Process Labels"โ145Updated last month
- โ327Updated 2 months ago
- โ59Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ195Updated last month
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsโ260Updated 7 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.โ306Updated 8 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.โ234Updated 3 weeks ago
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ104Updated 4 months ago
- โ144Updated last month
- โ138Updated this week
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuningโ441Updated 6 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.โ118Updated last month
- A Comprehensive Survey on Long Context Language Modelingโ138Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ177Updated last month
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"โ364Updated 3 months ago
- ๐ A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyondโ205Updated last week
- โ163Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)โ138Updated 2 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningโ149Updated 7 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It containsโฆโ203Updated last week
- TokenSkip: Controllable Chain-of-Thought Compression in LLMsโ136Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".โ52Updated 5 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsโ169Updated 10 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ189Updated 9 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsโ154Updated 10 months ago
- The official repository of the Omni-MATH benchmark.โ83Updated 4 months ago