skywalker023 / fantom
👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"
☆52Updated 8 months ago
Alternatives and similar repositories for fantom:
Users that are interested in fantom are comparing it to the libraries listed below
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆42Updated 2 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 8 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 4 months ago
- SILO Language Models code repository☆81Updated 11 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆113Updated 5 months ago
- ☆27Updated 11 months ago
- [EMNLP 2023] Official repository for Dialogue Chain-of-Thought Distillation (DONUT & DOCTOR)☆10Updated last year
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆48Updated 9 months ago
- ☆20Updated last year
- ☆17Updated 4 months ago
- ☆10Updated 5 months ago
- ☆40Updated last week
- ☆50Updated last year
- ☆27Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- Critique-out-Loud Reward Models☆51Updated 4 months ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆70Updated 9 months ago
- [TACL 2024] Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis☆10Updated 3 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- ☆23Updated last year
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 5 months ago
- ☆78Updated 7 months ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆34Updated 2 months ago
- ☆28Updated last year
- ☆44Updated 5 months ago
- A unified benchmark for math reasoning☆87Updated 2 years ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆99Updated last year