lilakk / BooookScore
A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".
☆115Updated 4 months ago
Alternatives and similar repositories for BooookScore:
Users that are interested in BooookScore are comparing it to the libraries listed below
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆125Updated 11 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆145Updated 2 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 6 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 4 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆128Updated 3 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆100Updated 7 months ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆86Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- ☆174Updated 2 years ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆113Updated 5 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆65Updated 10 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆119Updated 7 months ago
- ☆39Updated 6 months ago
- ☆66Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆156Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆66Updated 6 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆121Updated last month
- ☆139Updated 9 months ago
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"☆93Updated last year
- ☆116Updated 4 months ago
- ☆124Updated 3 weeks ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆54Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆111Updated 2 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆215Updated 3 months ago
- ☆47Updated 10 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆42Updated 2 months ago