A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".
☆130Oct 1, 2024Updated last year
Alternatives and similar repositories for BooookScore
Users that are interested in BooookScore are comparing it to the libraries listed below
Sorting:
- ☆60Sep 24, 2024Updated last year
- ☆33Dec 17, 2025Updated 2 months ago
- ☆32May 10, 2024Updated last year
- Query-focused summarization data☆44Feb 17, 2023Updated 3 years ago
- ☆198May 3, 2024Updated last year
- GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators☆47Dec 23, 2025Updated 2 months ago
- ☆39Jun 7, 2023Updated 2 years ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆22May 29, 2024Updated last year
- FRANK: Factuality Evaluation Benchmark☆59Dec 13, 2022Updated 3 years ago
- ☆16Jun 25, 2025Updated 8 months ago
- Implementation of AdaCQR(COLING 2025)☆13Dec 30, 2024Updated last year
- ☆11Jun 5, 2024Updated last year
- Lightweight piece tokenization library☆12Apr 15, 2024Updated last year
- FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu…☆13Apr 25, 2024Updated last year
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 2 years ago
- [ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations☆13Sep 11, 2024Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆82Apr 11, 2024Updated last year
- [ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"☆53Jan 21, 2024Updated 2 years ago
- ☆15Oct 20, 2023Updated 2 years ago
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆16Updated this week
- 定时爬取arXiv每日论文☆13May 22, 2023Updated 2 years ago
- Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"☆15Dec 20, 2021Updated 4 years ago
- Official demo repository for our ACL 2019 long paper "Generating Question-Answer Hierarchies".☆20Feb 13, 2026Updated 3 weeks ago
- ☆17Oct 18, 2022Updated 3 years ago
- Code for "Exponential Family Estimation via Adversarial Dynamics Embedding" (NeurIPS 2019)☆14Nov 26, 2019Updated 6 years ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Aug 15, 2024Updated last year
- [ACL 24 Findings] Implementation of Resonance RoPE and the PosGen synthetic dataset.☆24Mar 5, 2024Updated 2 years ago
- Class frequency estimation software package☆13Sep 1, 2019Updated 6 years ago
- ☆13Apr 24, 2022Updated 3 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆417Apr 13, 2025Updated 10 months ago
- Codebase, data and models for the SummaC paper in TACL☆109Jan 30, 2025Updated last year
- ☆33May 16, 2023Updated 2 years ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆226Feb 11, 2026Updated 3 weeks ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆76Jan 16, 2026Updated last month
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆195Oct 8, 2024Updated last year
- ☆18Nov 19, 2024Updated last year
- Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models☆15Feb 20, 2019Updated 7 years ago
- ☆20Jun 9, 2022Updated 3 years ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆378Sep 25, 2024Updated last year