StonyBrookNLP / musique
Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022
☆95Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for musique
- ☆72Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆93Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆60Updated 3 months ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆71Updated 2 weeks ago
- ☆166Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- Companion repo for "Evaluating Verifiability in Generative Search Engines".☆80Updated last year
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆61Updated 3 months ago
- Do Large Language Models Know What They Don’t Know?☆85Updated this week
- ☆66Updated 9 months ago
- ☆82Updated last year
- ☆80Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆64Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆42Updated last year
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆56Updated 2 weeks ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- ☆43Updated 6 months ago
- Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"☆80Updated last year
- Token-level Reference-free Hallucination Detection☆92Updated last year
- This project maintains a reading list for general text generation tasks☆65Updated 2 years ago
- [NAACL 2024] End-to-End Beam Retrieval for Multi-Hop Question Answering☆76Updated 6 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆59Updated 6 months ago
- A unified benchmark for math reasoning☆87Updated last year
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆30Updated last year
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆71Updated 10 months ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆39Updated 10 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆84Updated 4 months ago
- Repository for Decomposed Prompting☆82Updated 11 months ago
- ☆35Updated last year
- ☆55Updated last year