skywalker023 / fantom
👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"
☆53Updated 9 months ago
Alternatives and similar repositories for fantom:
Users that are interested in fantom are comparing it to the libraries listed below
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 9 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- ☆17Updated 5 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆44Updated 3 months ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆41Updated last year
- ☆40Updated last month
- ☆27Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆48Updated 10 months ago
- ☆24Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 6 months ago
- ☆50Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆113Updated 6 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- SILO Language Models code repository☆81Updated last year
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 5 months ago
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 6 months ago
- ☆29Updated last year
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆34Updated 3 months ago
- ☆34Updated 11 months ago
- ☆10Updated 6 months ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆70Updated 10 months ago
- ☆61Updated last year
- ☆44Updated 6 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆44Updated last month
- The Prism Alignment Project☆69Updated 11 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆97Updated last year
- ☆20Updated last year