wala / blanca
BLANCA - Benchmarks for LANguage models on Coding Artifacts
☆8Updated 2 years ago
Alternatives and similar repositories for blanca:
Users that are interested in blanca are comparing it to the libraries listed below
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆45Updated last year
- ☆15Updated 3 years ago
- We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…☆22Updated 2 years ago
- ☆40Updated 5 months ago
- ☆23Updated 2 weeks ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- ☆121Updated last year
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆70Updated 7 months ago
- The dataset for the variable-misuse task, used in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/f…☆22Updated 4 years ago
- Code for "StructCoder: Structure-Aware Transformer for Code Generation"☆70Updated last year
- Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.☆53Updated 6 months ago
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆106Updated last year
- ☆74Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆86Updated last year
- A plugin for code generation in PyCharm/IntelliJ using tranX☆35Updated 2 years ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆81Updated last year
- ☆59Updated 8 months ago
- ☆41Updated 8 months ago
- PROSE Public Benchmark Suite☆24Updated 3 months ago
- Official code release for the paper Coder Reviewer Reranking for Code Generation.☆42Updated last year
- Source codes for paper ”ReACC: A Retrieval-Augmented Code Completion Framework“☆60Updated 2 years ago
- Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"☆52Updated 10 months ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆28Updated last month
- Incremental Python parser for constrained generation of code by LLMs.☆15Updated 4 months ago
- VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning☆38Updated 2 years ago
- Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and Lin Tan☆15Updated 2 weeks ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆77Updated 9 months ago
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆71Updated 2 years ago
- PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. W…☆87Updated 2 years ago
- Training and Benchmarking LLMs for Code Preference.☆29Updated 2 months ago