tongye98 / Awesome-Code-BenchmarkLinks
A comprehensive code domain benchmark review of LLM researches.
☆39Updated last week
Alternatives and similar repositories for Awesome-Code-Benchmark
Users that are interested in Awesome-Code-Benchmark are comparing it to the libraries listed below
Sorting:
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆98Updated last week
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆55Updated this week
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆114Updated 11 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆99Updated 10 months ago
- Collection of papers for scalable automated alignment.☆91Updated 8 months ago
- ☆116Updated last month
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated 10 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆59Updated 6 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆26Updated last month
- Code implementation of synthetic continued pretraining☆114Updated 5 months ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆39Updated 3 months ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆55Updated 8 months ago
- A Comprehensive Benchmark for Software Development.☆108Updated last year
- A Comprehensive Survey on Long Context Language Modeling☆151Updated 2 weeks ago
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".☆13Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 8 months ago
- ☆46Updated last year
- Critique-out-Loud Reward Models☆66Updated 8 months ago
- The repo for In-context Autoencoder☆128Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆77Updated 11 months ago
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Updated last year
- NaturalCodeBench (Findings of ACL 2024)☆65Updated 8 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆85Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆125Updated last year
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆141Updated last month
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆158Updated last month
- A banchmark list for evaluation of large language models.☆127Updated last month
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆58Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆63Updated 8 months ago