zorazrw / odex
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆47Updated last year
Alternatives and similar repositories for odex:
Users that are interested in odex are comparing it to the libraries listed below
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆86Updated last year
- ☆24Updated 6 months ago
- ☆75Updated last month
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆59Updated 7 months ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆78Updated last year
- [EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages☆23Updated 2 years ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆23Updated 7 months ago
- ☆36Updated 10 months ago
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆71Updated 2 years ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 5 months ago
- A unified benchmark for math reasoning☆88Updated 2 years ago
- ☆42Updated last month
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆86Updated 2 years ago
- This repository contains data, code and models for contextual noncompliance.☆22Updated 9 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 3 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆71Updated 10 months ago
- ☆115Updated 9 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- Run SWE-bench evaluations remotely☆11Updated this week
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆44Updated last year
- ☆21Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- ☆44Updated 11 months ago
- ☆46Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated 2 years ago
- Supporting code for ReCEval paper☆28Updated 7 months ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆53Updated 6 months ago
- Benchmarking Generalization to New Tasks from Natural Language Instructions☆26Updated 3 years ago
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated last year