allenai / MacGyver
Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?
☆22Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for MacGyver
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆23Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆36Updated 8 months ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆24Updated last year
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- ☆26Updated 7 months ago
- Byte-sized text games for code generation tasks on virtual environments☆17Updated 4 months ago
- Few-shot Learning with Auxiliary Data☆26Updated 11 months ago
- Supporting code for ReCEval paper☆26Updated last month
- Repository for Skill Set Optimization☆12Updated 3 months ago
- Tasks for describing differences between text distributions.☆16Updated 3 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- ☆15Updated 3 months ago
- Restore safety in fine-tuned language models through task arithmetic☆25Updated 7 months ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆54Updated last year
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- ☆22Updated 2 years ago
- Evaluate the Quality of Critique☆35Updated 5 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 8 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- ☆14Updated 8 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆35Updated last year
- ☆33Updated 2 months ago
- ☆24Updated 9 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 4 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- Directional Preference Alignment☆49Updated last month
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆51Updated last year
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆27Updated last year