allenai / MacGyver
Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?
☆24Updated 9 months ago
Alternatives and similar repositories for MacGyver:
Users that are interested in MacGyver are comparing it to the libraries listed below
- Tasks for describing differences between text distributions.☆16Updated 5 months ago
- Supporting code for ReCEval paper☆27Updated 4 months ago
- ☆38Updated 3 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Evaluate the Quality of Critique☆35Updated 7 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆58Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- Adding new tasks to T0 without catastrophic forgetting☆32Updated 2 years ago
- ☆36Updated 5 months ago
- ☆15Updated 5 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 7 months ago
- ☆20Updated 7 months ago
- This is official project in our paper: Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers☆28Updated last year
- ☆44Updated 4 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆66Updated 2 weeks ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆51Updated 9 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆47Updated 8 months ago
- ☆38Updated 5 months ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆25Updated last year
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆55Updated 6 months ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆107Updated last year
- ☆40Updated last year
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- Lightweight tool to identify Data Contamination in LLMs evaluation☆45Updated 10 months ago
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆29Updated last year
- ☆26Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 3 weeks ago
- ☆29Updated 9 months ago