tongyx361 / symevalLinks
Evaluation utilities based on SymPy.
โ20Updated 10 months ago
Alternatives and similar repositories for symeval
Users that are interested in symeval are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ117Updated 10 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generationโ30Updated 3 weeks ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"โ174Updated 5 months ago
- โ47Updated 2 months ago
- The rule-based evaluation subset and code implementation of Omni-MATHโ24Updated 10 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ261Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ236Updated last month
- The official repository of the Omni-MATH benchmark.โ88Updated 10 months ago
- โ67Updated 6 months ago
- โ69Updated last year
- GenRM-CoT: Data release for verification rationalesโ67Updated last year
- Async pipelined version of Verlโ125Updated 7 months ago
- โ52Updated 5 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyโ74Updated last month
- Collections of RLxLM experiments using minimal codesโ14Updated 8 months ago
- โ13Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejectionโ52Updated last year
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"โ31Updated last year
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ144Updated last year
- Resources for the Enigmata Project.โ72Updated 2 months ago
- โ28Updated last month
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.โ63Updated last year
- โ58Updated last year
- Repo of paper "Free Process Rewards without Process Labels"โ165Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingโ179Updated 3 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".โ82Updated 9 months ago
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklistโ33Updated last year
- Reproducing R1 for Code with Reliable Rewardsโ264Updated 6 months ago
- โ52Updated 8 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.โ89Updated 3 weeks ago