CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆67Feb 3, 2025Updated last year
Alternatives and similar repositories for CodeElo
Users that are interested in CodeElo are comparing it to the libraries listed below
Sorting:
- ☆12Feb 11, 2026Updated last month
- ☆16Feb 6, 2024Updated 2 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆817Jul 16, 2025Updated 8 months ago
- ☆14Nov 11, 2025Updated 4 months ago
- ☆20Oct 10, 2025Updated 5 months ago
- LLM play 20questions with itself☆13Mar 31, 2023Updated 2 years ago
- R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning☆34Feb 9, 2026Updated last month
- ☆12Aug 8, 2023Updated 2 years ago
- ☆71Oct 23, 2025Updated 4 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Mar 6, 2025Updated last year
- ☆22Dec 17, 2025Updated 3 months ago
- Evaluation of LLMs on latest math competitions☆232Mar 10, 2026Updated last week
- ☆21Feb 10, 2025Updated last year
- An archive of learning resources assembled by current Exun members and alumni.☆15Oct 6, 2022Updated 3 years ago
- ☆11Aug 10, 2021Updated 4 years ago
- Apertium tools☆20May 27, 2021Updated 4 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 5 months ago
- Competitive Programming Code Template☆11Nov 6, 2022Updated 3 years ago
- Reproducing R1 for Code with Reliable Rewards☆297May 5, 2025Updated 10 months ago
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆19Sep 17, 2025Updated 6 months ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆18Jul 21, 2023Updated 2 years ago
- ☆234Feb 28, 2026Updated 3 weeks ago
- Data mapping framework for rust stuff☆49Updated this week
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated last year
- Code for CVPR paper: Computationally Budgeted Continual Learning: What Does Matter?☆17Mar 16, 2024Updated 2 years ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆14Jan 12, 2026Updated 2 months ago
- An esoteric programming language with just two data types: null and tape☆11Jan 31, 2024Updated 2 years ago
- ☆42Mar 26, 2025Updated 11 months ago
- my personal mcp server☆13Apr 23, 2025Updated 10 months ago
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆27Jul 23, 2025Updated 7 months ago
- ☆24Sep 24, 2024Updated last year
- ☆1,113Jan 10, 2026Updated 2 months ago
- Azure Command-Line Interface☆12Dec 10, 2023Updated 2 years ago
- Explore and Control with Adversarial Surprise☆10Jul 20, 2021Updated 4 years ago
- ☆21Dec 30, 2021Updated 4 years ago
- Code for Bayesian inference for queueing networks with incomplete data☆12Jul 5, 2017Updated 8 years ago
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated 2 years ago
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆24May 29, 2024Updated last year
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆79May 2, 2025Updated 10 months ago