skywalker023 / fantom
π» Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"
β51Updated 5 months ago
Related projects β
Alternatives and complementary repositories for fantom
- β48Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Modelβ41Updated 10 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our focβ¦β28Updated 5 months ago
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewardsβ44Updated 6 months ago
- β26Updated last year
- β27Updated 8 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesβ37Updated last month
- Critique-out-Loud Reward Modelsβ38Updated last month
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Modelsβ66Updated 6 months ago
- SILO Language Models code repositoryβ80Updated 9 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsβ108Updated last year
- β23Updated 11 months ago
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Followingβ79Updated 2 months ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuningβ97Updated last year
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoningβ44Updated last month
- β44Updated 2 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Mergingβ98Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learnersβ111Updated 2 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructionsβ40Updated 4 months ago
- β9Updated 2 months ago
- β11Updated 2 years ago
- DEMix Layers for Modular Language Modelingβ53Updated 3 years ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator supportβ¦β35Updated last year
- [ICLR 2022] Towards Continual Knowledge Learning of Language Modelsβ93Updated 2 years ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"β35Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β63Updated last year
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Caβ¦β55Updated last year
- Benchmarking Generalization to New Tasks from Natural Language Instructionsβ25Updated 3 years ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"β61Updated 7 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"β49Updated 9 months ago