GavinZhengOI/LiveCodeBench-Pro

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GavinZhengOI/LiveCodeBench-Pro)

GavinZhengOI / LiveCodeBench-Pro

☆176

Alternatives and similar repositories for LiveCodeBench-Pro

Users that are interested in LiveCodeBench-Pro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nlp-waseda / traveling-across-languages
View on GitHub
Official repo and evaluation implementation of KnowRecall and VisRecall
☆10May 22, 2025Updated last year
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated 11 months ago
QwenLM / CodeElo
View on GitHub
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆74Feb 3, 2025Updated last year
bigcode-project / bigcodearena
View on GitHub
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
☆61Oct 13, 2025Updated 9 months ago
stepfun-ai / StepFun-Prover-Preview
View on GitHub
Large language models designed for formal theorem proving through tool-integrated reasoning.
☆33Aug 13, 2025Updated 11 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 5 months ago
LiveCodeBench / LiveCodeBench
View on GitHub
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆910Jul 16, 2025Updated last year
TIGER-AI-Lab / AceCoder
View on GitHub
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆100Apr 9, 2025Updated last year
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆307Jul 13, 2025Updated last year
LeiLiLab / HardTestGen
View on GitHub
☆17Jan 27, 2026Updated 5 months ago
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
fzyzcjy / ai_math_paper_list
View on GitHub
AI for Mathematics Paper List
☆17Jan 14, 2025Updated last year
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
He-Ren / OJBench
View on GitHub
☆32Feb 28, 2026Updated 4 months ago
kyegomez / Reka-Torch
View on GitHub
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆28Jul 13, 2026Updated last week
SkyworkAI / MindLink
View on GitHub
☆100Aug 8, 2025Updated 11 months ago
RUCAIBox / ICPC-Eval
View on GitHub
A new benchmark of 118 ICPC problems for evaluating LLM reasoning in competitive coding, featuring realistic ICPC competition scenario, r…
☆18May 18, 2025Updated last year
lupantech / ineqmath
View on GitHub
Solving Inequality Proofs with Large Language Models.
☆61Dec 15, 2025Updated 7 months ago
BytedTsinghua-SIA / Enigmata
View on GitHub
Resources for the Enigmata Project.
☆82Aug 13, 2025Updated 11 months ago
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆708Jul 29, 2025Updated 11 months ago
ganler / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆313May 5, 2025Updated last year
Gen-Verse / CURE
View on GitHub
[NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
☆167Sep 19, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆430Updated this week
Ablustrund / APPS_Plus
View on GitHub
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆73Aug 31, 2024Updated last year
MoonshotAI / Kimi-Researcher
View on GitHub
☆80Jun 20, 2025Updated last year
RUCAIBox / JiuZhang3.0
View on GitHub
The code and data for the paper JiuZhang3.0
☆49May 26, 2024Updated 2 years ago
benchjack / benchjack
View on GitHub
AI agent benchmark hackability scanner — find evaluation vulnerabilities before they undermine your results
☆40May 25, 2026Updated last month
MoonshotAI / CombiBench
View on GitHub
☆52Jun 15, 2026Updated last month
vision-x-nyu / test-set-training
View on GitHub
☆15Nov 25, 2025Updated 7 months ago
LLM360 / Reasoning360
View on GitHub
A repo for open research on building large reasoning models
☆150Jul 3, 2026Updated 2 weeks ago
vision-x-nyu / vstat
View on GitHub
Evaluation code for "Benchmarking Visual State Tracking in Multimodal Video Understanding"
☆34Jun 3, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MoonshotAI / WorldVQA
View on GitHub
☆118Feb 4, 2026Updated 5 months ago
ZubinGou / math-evaluation-harness
View on GitHub
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆277Apr 26, 2024Updated 2 years ago
SimpleVQA / SimpleVQA
View on GitHub
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
☆15Feb 20, 2025Updated last year
agentica-project / verl-pipeline
View on GitHub
Async pipelined version of Verl
☆124Apr 8, 2025Updated last year
microsoft / SWE-bench-Live
View on GitHub
[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!
☆209Jun 11, 2026Updated last month
FrontierCS / Frontier-CS
View on GitHub
A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
☆270Updated this week
THUDM / NaturalCodeBench
View on GitHub
NaturalCodeBench (Findings of ACL 2024)
☆70Oct 14, 2024Updated last year