bigcode-project / bigcodebench-annotationLinks

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

☆23

Alternatives and similar repositories for bigcodebench-annotation

Users that are interested in bigcodebench-annotation are comparing it to the libraries listed below

Sorting:

qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆61Updated 9 months ago
crux-eval / eval-arena
☆28Updated last week
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆58Updated last year
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆67Updated 10 months ago
shunzh / Code-AI-Tree-Search
☆119Updated last year
protagolabs / odyssey-math
☆84Updated 6 months ago
MLE-Dojo / MLE-Dojo
☆57Updated this week
R2E-Gym / R2E-Gym
Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆128Updated last week
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆34Updated 8 months ago
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆149Updated 9 months ago
SparksofAGI / MHPP
☆31Updated last month
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
LZhengisme / self-infilling
[ICML 2024] Self-Infilling Code Generation
☆18Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
YuxiXie / SelfEval-Guided-Decoding
☆99Updated last year
CodeEditorBench / CodeEditorBench
☆49Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆101Updated last month
rmshin / llm-mcts
☆41Updated last year
reddy-lab-code-research / PPOCoder
Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"
☆114Updated last year
google-research / arcade-nl2code
☆54Updated last year
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆119Updated 6 months ago
zorazrw / odex
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆48Updated last year
Zayne-sprague / MuSR
☆48Updated 11 months ago
huggingface / ioi
☆35Updated 4 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆102Updated 4 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
SalesforceAIResearch / swecomm
☆27Updated 6 months ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆130Updated 9 months ago