allenai / fermi
☆28Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for fermi
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆44Updated 10 months ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- Supporting code for ReCEval paper☆26Updated 2 months ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated 2 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆18Updated 2 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆30Updated 8 months ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?☆22Updated 7 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 7 months ago
- ☆11Updated 5 months ago
- Repository for Skill Set Optimization☆12Updated 3 months ago
- A unified benchmark for math reasoning☆87Updated last year
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆51Updated last year
- ☆33Updated 3 years ago
- ☆25Updated last week
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- ☆18Updated 5 months ago
- Few-shot Learning with Auxiliary Data☆26Updated 11 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆41Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆71Updated 2 years ago
- ☆24Updated 4 months ago
- ☆21Updated 2 weeks ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- ☆16Updated 3 years ago
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…☆29Updated last year
- Automatic metrics for GEM tasks☆61Updated 2 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- ☆24Updated this week
- ☆38Updated 7 months ago
- ☆22Updated 2 years ago