CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆65Feb 3, 2025Updated last year
Alternatives and similar repositories for CodeElo
Users that are interested in CodeElo are comparing it to the libraries listed below
Sorting:
- ☆16Feb 6, 2024Updated 2 years ago
- ☆12Feb 11, 2026Updated 2 weeks ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 10 months ago
- ☆13Jul 5, 2024Updated last year
- ☆11Aug 10, 2021Updated 4 years ago
- Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.☆18Dec 7, 2022Updated 3 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆803Jul 16, 2025Updated 7 months ago
- ☆20Oct 10, 2025Updated 4 months ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 4 months ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆18Jul 21, 2023Updated 2 years ago
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆27Jul 23, 2025Updated 7 months ago
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated last year
- ☆21Dec 30, 2021Updated 4 years ago
- This is a framework for evaluating reasoning in foundational Video Models.☆57Updated this week
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆78May 2, 2025Updated 10 months ago
- Ranking of fine-tuned HF models as base models.☆36Sep 17, 2025Updated 5 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 9 months ago
- Code for paper 'Data-Efficient FineTuning'☆28May 24, 2023Updated 2 years ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆31Updated this week
- Example ML projects that use the Determined library.☆32Sep 11, 2024Updated last year
- ☆71Oct 23, 2025Updated 4 months ago
- ☆41Mar 26, 2025Updated 11 months ago
- ☆232Dec 3, 2025Updated 2 months ago
- ☆10Aug 7, 2024Updated last year
- A Data-Driven Approach to Predict the Success of Bank Telemarketing☆10Apr 27, 2021Updated 4 years ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆97Apr 9, 2025Updated 10 months ago
- ☆1,104Jan 10, 2026Updated last month
- [NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆42Jul 7, 2025Updated 7 months ago
- Fast and memory-efficient Python PDF Parser based on xpdf sources☆44Dec 15, 2023Updated 2 years ago
- Reproducing R1 for Code with Reliable Rewards☆290May 5, 2025Updated 9 months ago
- Guide to interviewing for industry machine learning roles (data/applied/research scientist, ML engineer, etc).☆11Dec 28, 2022Updated 3 years ago
- ☆11Jun 30, 2025Updated 8 months ago
- Kernel Playground - A playground to run large scale experiments on the Linux Kernel☆17Nov 8, 2025Updated 3 months ago
- Open Source Multivalue String Database☆13Feb 16, 2026Updated last week
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year
- A collection of demos and utilities prepared ahead of the Vector Institute Privacy Enhancing Techniques (PETs) Bootcamp.☆15Sep 22, 2022Updated 3 years ago
- 💀 gigasmol: a lightweight wrapper for gigachat api model for seamless use with smolagents.☆15Oct 23, 2025Updated 4 months ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year