facebookresearch / PerSELinks
Personalized Story Evaluation Model
☆18Updated 2 years ago
Alternatives and similar repositories for PerSE
Users that are interested in PerSE are comparing it to the libraries listed below
Sorting:
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆74Updated last year
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆42Updated last year
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Updated 2 months ago
- contrastive decoding☆206Updated 3 years ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆23Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆71Updated 3 years ago
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Updated 2 years ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆81Updated last year
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆95Updated last year
- ☆88Updated 3 years ago
- ☆78Updated last year
- ☆34Updated last month
- ☆103Updated 2 years ago
- ☆15Updated last year
- AbstainQA, ACL 2024☆28Updated this week
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Updated last year
- ☆27Updated 2 years ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆119Updated last year
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆116Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆63Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆35Updated last year
- ☆22Updated 3 years ago
- ☆84Updated last week
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆72Updated 3 weeks ago
- ☆43Updated last year
- ☆68Updated 2 years ago
- BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages☆45Updated 5 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆42Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Updated last year
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆127Updated last year