UKPLab / acl2024-ircoderLinks
Data creation, training and eval scripts for the IRCoder paper
☆20Updated last year
Alternatives and similar repositories for acl2024-ircoder
Users that are interested in acl2024-ircoder are comparing it to the libraries listed below
Sorting:
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆85Updated last year
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆118Updated 2 years ago
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆267Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆132Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆170Updated 5 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆67Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆119Updated last year
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆129Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆81Updated last year
- Source codes for paper ”ReACC: A Retrieval-Augmented Code Completion Framework“☆65Updated 3 years ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆119Updated 6 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆136Updated last year
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Updated 10 months ago
- APIBench is a benchmark for evaluating the performance of API recommendation approaches released in the paper "Revisiting, Benchmarking a…☆65Updated 2 years ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆41Updated 11 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆109Updated last week
- Data and Code for Program of Thoughts [TMLR 2023]☆303Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆59Updated last year
- ☆33Updated 2 years ago
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆75Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆151Updated last year
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆132Updated 2 years ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆154Updated 5 months ago
- ☆187Updated 7 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆48Updated 2 years ago
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆240Updated 2 years ago
- ☆22Updated 2 years ago
- ☆294Updated 2 years ago
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆56Updated last year
- ☆33Updated 4 months ago