QwenLM/CodeElo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QwenLM/CodeElo)

QwenLM / CodeElo

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

☆74

Alternatives and similar repositories for CodeElo

Users that are interested in CodeElo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

QwenLM / PolyMath
View on GitHub
[NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"
☆43May 22, 2025Updated last year
QwenLM / ConsisEval
View on GitHub
☆14Jul 5, 2024Updated 2 years ago
GavinZhengOI / LiveCodeBench-Pro
View on GitHub
☆176Dec 13, 2025Updated 7 months ago
LiveCodeBench / LiveCodeBench
View on GitHub
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆911Jul 16, 2025Updated last year
Ginjing-Yuan / QWen2-from_ground_up
View on GitHub
☆22Jul 15, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jinlanfu / Polyglot_Prompt
View on GitHub
Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.
☆18Dec 7, 2022Updated 3 years ago
Sanster / padding_free_llm_train
View on GitHub
☆16Feb 6, 2024Updated 2 years ago
LinghaoChan / HumanTOMATO
View on GitHub
Web page for "🍅HumanTOMATO: Text-aligned Whole-body Motion Generation".
☆15May 25, 2024Updated 2 years ago
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 3 months ago
facebookresearch / Multi-IF
View on GitHub
The evaluation code for MultiIF multi-turn and multi-lingual instruction following
☆63Oct 29, 2024Updated last year
QwenLM / qwen-code-action
View on GitHub
A GitHub Action that integrates Qwen Code into your development workflow.
☆31Jul 9, 2026Updated last week
THUDM / NaturalCodeBench
View on GitHub
NaturalCodeBench (Findings of ACL 2024)
☆70Oct 14, 2024Updated last year
evanthebouncy / 20Q-selfplay
View on GitHub
LLM play 20questions with itself
☆13Mar 31, 2023Updated 3 years ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
shash42 / Evaluating-Inexact-Unlearning
View on GitHub
☆12Aug 8, 2023Updated 2 years ago
nikhilchandak / answer-matching
View on GitHub
Code for 'Answer Matching Outperforms Multiple Choice for Language Model Evaluation' paper
☆18Jul 4, 2025Updated last year
microsoft / amlFilesystem-lustre
View on GitHub
Lustre Repository with MS patches
☆16Updated this week
LLM360 / TxT360
View on GitHub
☆25Dec 18, 2024Updated last year
McGill-NLP / VinePPO
View on GitHub
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆192May 25, 2025Updated last year
microsoft / go-deviceid
View on GitHub
Golang package for setting/retrieving a device id.
☆18Apr 24, 2025Updated last year
anthropics / rogue-deploy-eval
View on GitHub
☆16Jan 21, 2025Updated last year
knoveleng / open-rs
View on GitHub
[AAAI 2026] - Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆291Mar 11, 2026Updated 4 months ago
model-similarity / lm-similarity
View on GitHub
☆21Feb 10, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
iiis-ai / IterativeQuestionComposing
View on GitHub
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)
☆23Oct 2, 2025Updated 9 months ago
FlagOpen / TACO
View on GitHub
☆239Feb 28, 2026Updated 4 months ago
QwenLM / Qwen-Cookbook
View on GitHub
Open-source examples and guides for building with the Qwen. Browse a collection of snippets, advanced techniques and walkthroughs.
☆38Nov 20, 2024Updated last year
BinWang28 / FacEval
View on GitHub
EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization
☆13Mar 20, 2025Updated last year
microsoft / Attestation-Client-Samples
View on GitHub
☆16Nov 18, 2025Updated 8 months ago
exunclan / resources
View on GitHub
An archive of learning resources assembled by current Exun members and alumni.
☆15Jun 23, 2026Updated 3 weeks ago
renxingkai / MRC_Leaderboard
View on GitHub
Machine Reading Comprehension Leadboard Summary
☆12Jan 4, 2021Updated 5 years ago
czy1999 / TKGQA_Competition_Baseline
View on GitHub
多粒度时序知识图谱问答Baseline模型
☆23May 28, 2023Updated 3 years ago
stepfun-ai / StepFun-Formalizer
View on GitHub
StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion
☆29Aug 19, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Backl1ght / competitive-programming-code-template
View on GitHub
Competitive Programming Code Template
☆10Nov 6, 2022Updated 3 years ago
microsoft / kiota-abstractions-go
View on GitHub
Abstractions library for the Kiota generated SDKs in go
☆19Updated this week
MetabrainAGI / Awaker2.5-R1
View on GitHub
☆12Mar 22, 2025Updated last year
szhang42 / Calibration_qa
View on GitHub
☆11Aug 10, 2021Updated 4 years ago
microsoft / poolprovider-for-k8s
View on GitHub
Kubernetes based pool provider implementation for Azure DevOps pipelines
☆15Mar 18, 2025Updated last year
0xWJ / code-judge
View on GitHub
☆24Oct 10, 2025Updated 9 months ago
stepfun-ai / StepFun-Prover-Preview
View on GitHub
Large language models designed for formal theorem proving through tool-integrated reasoning.
☆33Aug 13, 2025Updated 11 months ago