zorazrw / odexLinks

[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation

☆48

Alternatives and similar repositories for odex

Users that are interested in odex are comparing it to the libraries listed below

Sorting:

niansong1996 / lever
Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)
☆89Updated 2 years ago
reasoning-machines / CoCoGen
Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)
☆86Updated 2 years ago
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆62Updated 10 months ago
nyu-mll / ILF-for-code-generation
☆78Updated 4 months ago
crux-eval / eval-arena
☆28Updated 2 weeks ago
facebookresearch / mbr-exec
code for "Natural Language to Code Translation with Execution"
☆41Updated 2 years ago
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆34Updated 8 months ago
esteng / regal_program_learning
☆24Updated 10 months ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆58Updated 2 years ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆67Updated 11 months ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆59Updated 6 months ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆57Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences
☆71Updated last year
shunzh / Code-AI-Tree-Search
☆119Updated last year
allenai / bff
☆39Updated last year
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆78Updated last year
yuzhaouoe / pretraining-data-packing
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆22Updated 11 months ago
allenai / Lila
A unified benchmark for math reasoning
☆88Updated 2 years ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
EleutherAI / semantic-memorization
☆44Updated 8 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆44Updated last year
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆59Updated last year
seonghyeonye / Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
☆116Updated last month
kernelmachine / silo-lm
SILO Language Models code repository
☆81Updated last year
rmshin / llm-mcts
☆41Updated last year
zorazrw / multilingual-conala
[EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
☆23Updated 2 years ago
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆65Updated 2 years ago
google-deepmind / streamingqa
☆48Updated last year