infi-coder / infibench-evaluation-harnessLinks
The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.
☆14Updated last year
Alternatives and similar repositories for infibench-evaluation-harness
Users that are interested in infibench-evaluation-harness are comparing it to the libraries listed below
Sorting:
- NaturalCodeBench (Findings of ACL 2024)☆69Updated last year
- ☆28Updated 3 months ago
- ☆46Updated 8 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆63Updated last year
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆49Updated 3 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Updated last year
- ☆56Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Updated 10 months ago
- ☆100Updated 6 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆165Updated last year
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆46Updated 5 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆60Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆85Updated last year
- Automatic prompt optimization framework for multi-step agent tasks.☆36Updated last year
- Open Implementations of LLM Analyses☆107Updated last year
- ☆131Updated 9 months ago
- ☆31Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Updated 2 years ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆62Updated last year
- Reformatted Alignment☆111Updated last year
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Updated 11 months ago
- Multi-Granularity LLM Debugger [ICSE2026]☆96Updated 7 months ago
- ☆33Updated last week
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Updated 4 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆69Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆59Updated last year
- ☆51Updated last year
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated 2 years ago
- ☆90Updated 3 months ago