tongyx361 / symevalLinks
Evaluation utilities based on SymPy.
โ20Updated 10 months ago
Alternatives and similar repositories for symeval
Users that are interested in symeval are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ115Updated 10 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generationโ28Updated last week
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"โ174Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ231Updated last month
- โ47Updated last month
- โ67Updated 6 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ258Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATHโ23Updated 9 months ago
- โ51Updated 4 months ago
- Async pipelined version of Verlโ119Updated 6 months ago
- โ208Updated 6 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejectionโ51Updated 11 months ago
- The official repository of the Omni-MATH benchmark.โ88Updated 9 months ago
- โ69Updated last year
- โ74Updated 11 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.โ86Updated 4 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyโ69Updated last week
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluationโ58Updated 6 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".โ112Updated 2 months ago
- โ13Updated last year
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ143Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningโ114Updated 5 months ago
- โ58Updated last year
- โ52Updated 7 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"โ30Updated last year
- Collections of RLxLM experiments using minimal codesโ14Updated 8 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ215Updated last year
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".โ82Updated 9 months ago
- โ20Updated 10 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMsโ79Updated 2 years ago