allenai / fermi
☆28Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for fermi
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆44Updated 10 months ago
- Supporting code for ReCEval paper☆26Updated last month
- A unified benchmark for math reasoning☆87Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated last year
- ☆33Updated 2 years ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 5 months ago
- ☆46Updated last month
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16Updated 2 years ago
- ☆22Updated 2 years ago
- OpenPI dataset for tracking entities in open domain procedural text☆21Updated 3 months ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆71Updated 2 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- ☆23Updated 2 months ago
- SILO Language Models code repository☆80Updated 8 months ago
- ☆12Updated 5 months ago
- code for "Natural Language to Code Translation with Execution"☆39Updated 2 years ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆57Updated 2 months ago
- ☆44Updated 2 months ago
- EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443☆81Updated last month
- ☆17Updated 11 months ago
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…☆29Updated last year
- Automatic metrics for GEM tasks☆61Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆58Updated 7 months ago
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆34Updated last year
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆51Updated 5 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 9 months ago
- Few-shot Learning with Auxiliary Data☆26Updated 11 months ago
- Repository for Skill Set Optimization☆12Updated 3 months ago