wumingqi/LLM-Math-Evaluation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wumingqi/LLM-Math-Evaluation)

wumingqi / LLM-Math-Evaluation

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.

☆21

Alternatives and similar repositories for LLM-Math-Evaluation

Users that are interested in LLM-Math-Evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haolunc / iGSM-Replication-physics-LLM
View on GitHub
This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.
☆17Sep 13, 2024Updated last year
limenlp / verl
View on GitHub
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆56Jun 13, 2025Updated last year
zzhang0179 / Unveiling-Linguistic-Regions-in-LLMs
View on GitHub
[ACL 2024] Unveiling Linguistic Regions in Large Language Models
☆34Jun 9, 2024Updated 2 years ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
Lemon-cmd / diffusion-jax
View on GitHub
Diffusion Probabilistic Model in Jax
☆13Apr 20, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ElvishElvis / LCA-on-the-line
View on GitHub
LCA-on-the-line (ICML 2024 Oral)
☆14Feb 13, 2025Updated last year
VIM-Bench / VIM_TOOL
View on GitHub
☆12Jun 12, 2024Updated 2 years ago
SURF-ML / 2D-VQ-AE-2
View on GitHub
2D Vector-Quantized Auto-Encoder for compression of Whole-Slide Images in Histopathology
☆16Jul 18, 2024Updated 2 years ago
apartresearch / Integer_Addition
View on GitHub
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
☆19Aug 16, 2024Updated last year
LAVA-LAB / safe-slac
View on GitHub
Safe SLAC, an algorithm for safe cost-constrained reinforcement learning in high-dimensional POMDPs.
☆11Mar 1, 2023Updated 3 years ago
LLM-MI-Research / Actionable-MI
View on GitHub
☆15Jan 20, 2026Updated 6 months ago
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated last week
zerolllin / Delta-L-Normalization
View on GitHub
☆16Oct 11, 2025Updated 9 months ago
lrs1353281004 / ChatGPT_recipes
View on GitHub
持续追踪ChatGPT相关的技术资料和行业进展。
☆11Apr 24, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
RUCBM / ICLEval
View on GitHub
☆14Jun 24, 2024Updated 2 years ago
flowersteam / EAGER
View on GitHub
☆10Oct 11, 2022Updated 3 years ago
horizon-llm / Think-RM
View on GitHub
[NeurIPS 2025] Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
☆17Nov 2, 2025Updated 8 months ago
guanrenyang / Tiny-TPU
View on GitHub
☆10Dec 15, 2023Updated 2 years ago
languini-kitchen / languini-kitchen
View on GitHub
The official Languini Kitchen repository
☆14May 6, 2024Updated 2 years ago
wenquanlu / huginn-latent-cot
View on GitHub
[COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…
☆19Oct 4, 2025Updated 9 months ago
IntologyAI / NanoGPT-Bench
View on GitHub
☆21Jul 3, 2026Updated 2 weeks ago
amazon-science / PAE
View on GitHub
☆70Mar 6, 2025Updated last year
xiaojunxu / learning-to-watermark-llm
View on GitHub
☆22Mar 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Sphere-AI-Lab / OrthoMerge
View on GitHub
Implementation of <Orthogonal Model Merging>
☆33May 27, 2026Updated last month
THU-KEG / VerIF
View on GitHub
[EMNLP 2025] Verification Engineering for RL in Instruction Following
☆57Mar 30, 2026Updated 3 months ago
MangoKiller / SimOAR_OAR
View on GitHub
☆11Nov 8, 2023Updated 2 years ago
domaineval / DomainEval
View on GitHub
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …
☆13Dec 12, 2024Updated last year
alestolfo / lm-arithmetic
View on GitHub
Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"
☆20Jun 12, 2025Updated last year
nji3 / PCA_Autoencoder_FisherFace
View on GitHub
Using PCA, Autoencoder and Fisher linear discriminant to extract the effective representations from the face images. Do the reconstructio…
☆12Apr 23, 2019Updated 7 years ago
Philip-MIT / rover-vlm
View on GitHub
☆18Dec 1, 2025Updated 7 months ago
mcleish7 / retrofitting-recurrence
View on GitHub
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
☆68Nov 11, 2025Updated 8 months ago
facebookresearch / dual-system-for-visual-language-reasoning
View on GitHub
Github repo for Peifeng's internship project
☆13Nov 7, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
THU-KEG / Crab
View on GitHub
[CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models
☆18May 23, 2025Updated last year
flathub / io.qt.qtwebengine.BaseApp
View on GitHub
☆12Jul 14, 2026Updated last week
merlresearch / SMART
View on GitHub
Training and testing code from our CVPR 2023 paper "Are Deep Neural Networks SMARTer than Second Graders?"
☆11Aug 10, 2023Updated 2 years ago
GraphPKU / CoI
View on GitHub
Chain of Images for Intuitively Reasoning
☆10Nov 29, 2023Updated 2 years ago
bryanchrist / MathNeuro
View on GitHub
Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes
☆23Jun 15, 2025Updated last year
liyingxuan1012 / zeroshot-speaker-prediction
View on GitHub
Official repository of "Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion" (ACMMM 2024)
☆16Oct 31, 2024Updated last year
prnake / Comment9
View on GitHub
A simple & powerful danmaku framework.
☆14Mar 17, 2023Updated 3 years ago