richardodliu / OpenCodeEvalLinks
☆48Updated 3 months ago
Alternatives and similar repositories for OpenCodeEval
Users that are interested in OpenCodeEval are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆52Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆119Updated 11 months ago
- Async pipelined version of Verl☆125Updated 7 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆240Updated 2 months ago
- Repository of LV-Eval Benchmark☆71Updated last year
- Evaluation utilities based on SymPy.☆20Updated 11 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆179Updated 6 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆32Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆204Updated this week
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings☆56Updated 9 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Updated 7 months ago
- The official repository of the Omni-MATH benchmark.☆88Updated 11 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆191Updated last year
- Reproducing R1 for Code with Reliable Rewards☆272Updated 6 months ago
- ☆120Updated 5 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆116Updated 6 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆36Updated 9 months ago
- ☆65Updated last year
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆180Updated 4 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆112Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆267Updated last year
- ☆77Updated 8 months ago
- The HELMET Benchmark☆186Updated 3 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆128Updated last year
- The code and data for the paper JiuZhang3.0☆49Updated last year
- ☆76Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆222Updated last year
- ☆34Updated last year
- ☆46Updated 6 months ago
- ☆20Updated last month