CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆74Feb 3, 2025Updated last year
Alternatives and similar repositories for CodeElo
Users that are interested in CodeElo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…☆31Oct 10, 2025Updated 6 months ago
- ☆12Feb 11, 2026Updated 2 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆852Jul 16, 2025Updated 9 months ago
- Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.☆18Dec 7, 2022Updated 3 years ago
- Temporal Knowledge Graph Question Answering via Subgraph Reasoning☆16Mar 23, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆12Aug 8, 2023Updated 2 years ago
- James' cookbook of evaluations and finetuning experiments☆26Feb 19, 2026Updated 2 months ago
- ☆72Oct 23, 2025Updated 6 months ago
- ☆24Dec 17, 2025Updated 4 months ago
- An archive of learning resources assembled by current Exun members and alumni.☆15Apr 8, 2026Updated 3 weeks ago
- Evaluation of LLMs on latest math competitions☆256Apr 22, 2026Updated last week
- A stateless password management solution☆10Sep 11, 2018Updated 7 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 6 months ago
- A lambda calculus parser, evaluator and repl☆11Oct 30, 2021Updated 4 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Reproducing R1 for Code with Reliable Rewards☆310May 5, 2025Updated 11 months ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆18Jul 21, 2023Updated 2 years ago
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆19Sep 17, 2025Updated 7 months ago
- ☆239Feb 28, 2026Updated 2 months ago
- Code for CVPR paper: Computationally Budgeted Continual Learning: What Does Matter?☆17Mar 16, 2024Updated 2 years ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated last year
- ☆42Mar 26, 2025Updated last year
- my personal mcp server☆13Apr 23, 2025Updated last year
- 实现一个自己的小语言模型☆11Jun 15, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆24Sep 24, 2024Updated last year
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆28Jul 23, 2025Updated 9 months ago
- ☆1,135Jan 10, 2026Updated 3 months ago
- Azure Command-Line Interface☆15Mar 26, 2026Updated last month
- Explore and Control with Adversarial Surprise☆10Jul 20, 2021Updated 4 years ago
- ☆21Dec 30, 2021Updated 4 years ago
- Code for Bayesian inference for queueing networks with incomplete data☆12Jul 5, 2017Updated 8 years ago
- ☆14Jan 21, 2025Updated last year
- Chef cookbooks for managing a Ceph cluster☆12Apr 2, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆80May 2, 2025Updated 11 months ago
- VectorDefense: Vectorization as a Defense to Adversarial Examples --->☆13May 3, 2018Updated 7 years ago
- Released Apertium translation pairs☆31May 27, 2021Updated 4 years ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 6 months ago
- Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours☆293Updated this week
- "We must know. We shall know." - David Hilbert☆21Sep 8, 2025Updated 7 months ago
- Code for paper 'Data-Efficient FineTuning'☆28May 24, 2023Updated 2 years ago