yuh-zha / AlignScoreLinks
ACL2023 - AlignScore, a metric for factual consistency evaluation.
☆128Updated last year
Alternatives and similar repositories for AlignScore
Users that are interested in AlignScore are comparing it to the libraries listed below
Sorting:
- ☆179Updated 2 weeks ago
- Token-level Reference-free Hallucination Detection☆94Updated last year
- Codebase, data and models for the SummaC paper in TACL☆96Updated 4 months ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆202Updated last year
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆155Updated last month
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆251Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆104Updated 2 years ago
- ☆135Updated 5 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆129Updated last year
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆69Updated 10 months ago
- Scalable training for dense retrieval models.☆298Updated 2 weeks ago
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆101Updated 2 years ago
- ☆283Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- Document Ranking with Large Language Models.☆164Updated 3 weeks ago
- Code for Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks☆57Updated last year
- A Survey of Attributions for Large Language Models☆203Updated 10 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆353Updated 2 months ago
- RARR: Researching and Revising What Language Models Say, Using Language Models☆47Updated 2 years ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆100Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆59Updated last year
- ☆109Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆124Updated 10 months ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated last year
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆53Updated last month
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆62Updated 2 years ago
- Companion repo for "Evaluating Verifiability in Generative Search Engines".☆83Updated 2 years ago
- [ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.☆90Updated 3 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆185Updated 6 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆144Updated last month