Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.
☆21Jul 18, 2025Updated 7 months ago
Alternatives and similar repositories for LLM-Math-Evaluation
Users that are interested in LLM-Math-Evaluation are comparing it to the libraries listed below
Sorting:
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- ☆10Oct 11, 2022Updated 3 years ago
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated 2 weeks ago
- Training and testing code from our CVPR 2023 paper "Are Deep Neural Networks SMARTer than Second Graders?"☆11Aug 10, 2023Updated 2 years ago
- ProxyExplainer for Graph Neural Networks☆15Oct 24, 2024Updated last year
- The official code of TACL 2022, "Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition".☆11Oct 18, 2021Updated 4 years ago
- ☆10Jun 11, 2023Updated 2 years ago
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 4 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆56May 2, 2025Updated 10 months ago
- Line shuffler for huge text file which does not fit in memory☆13Dec 1, 2022Updated 3 years ago
- ☆10Jun 28, 2025Updated 8 months ago
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.☆19Feb 7, 2025Updated last year
- Repo for the walking robot's vision based navigation code☆10Jun 6, 2023Updated 2 years ago
- ☆10Nov 29, 2024Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- ☆13Dec 5, 2022Updated 3 years ago
- Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"☆14Feb 11, 2025Updated last year
- Xiaomi miwifi remote prometheus exporter, only support offical framework.☆11Nov 9, 2025Updated 3 months ago
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆25Jun 30, 2025Updated 8 months ago
- code for kdd feasibiiity☆12Jul 17, 2023Updated 2 years ago
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆15Jun 3, 2025Updated 8 months ago
- ☆12Sep 12, 2024Updated last year
- ☆10Dec 15, 2023Updated 2 years ago
- Cell-Level RSRP Estimation with the Image-to-Image Wireless Propagation Model Based on Measured data.☆13Oct 10, 2023Updated 2 years ago
- The official data and code for EMNLP 2023 main conference paper: CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular D…☆13May 19, 2025Updated 9 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated 11 months ago
- Rethinking the Trust Region in LLM Reinforcement Learning☆38Feb 5, 2026Updated 3 weeks ago
- ☆11Jun 12, 2024Updated last year
- ☆18Apr 10, 2025Updated 10 months ago
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models☆22Jan 6, 2026Updated last month
- Github repo for Peifeng's internship project☆13Nov 7, 2023Updated 2 years ago
- ☆10Nov 16, 2023Updated 2 years ago
- Code and notebooks and data for the paper "Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large L…☆12Jan 23, 2024Updated 2 years ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 3 months ago
- This is the project for IRM methods☆12Sep 13, 2021Updated 4 years ago
- This repository hosts the source code for the paper "ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Mo…☆16Dec 16, 2025Updated 2 months ago
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Nov 15, 2024Updated last year