lilakk/BooookScore

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lilakk/BooookScore)

lilakk / BooookScore

A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".

☆130

Alternatives and similar repositories for BooookScore

Users that are interested in BooookScore are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mungg / FABLES
View on GitHub
☆61Sep 24, 2024Updated last year
amazon-science / tofueval
View on GitHub
☆32May 10, 2024Updated 2 years ago
salesforce / booksum
View on GitHub
☆199Jun 25, 2026Updated 3 weeks ago
Yale-LILY / ROSE
View on GitHub
☆41Jun 7, 2023Updated 3 years ago
nyu-mll / SQuALITY
View on GitHub
Query-focused summarization data
☆44Feb 17, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
aryopg / decore
View on GitHub
Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"
☆30Dec 18, 2024Updated last year
princeton-nlp / ProLong
View on GitHub
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆261Sep 12, 2025Updated 10 months ago
martiansideofthemoon / relic-retrieval
View on GitHub
Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).
☆20May 14, 2022Updated 4 years ago
LuLuLuyi / LongHeads
View on GitHub
[EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆32Apr 8, 2024Updated 2 years ago
lilakk / BLEUBERI
View on GitHub
Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
☆32Jun 5, 2025Updated last year
Arlenelalala / ArxivPaper
View on GitHub
定时爬取arXiv每日论文
☆13May 22, 2023Updated 3 years ago
init0xyz / AdaCQR
View on GitHub
Implementation of AdaCQR(COLING 2025)
☆15Dec 30, 2024Updated last year
Alsace08 / SumCoT
View on GitHub
[ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"
☆54Jan 21, 2024Updated 2 years ago
marzenakrp / nocha
View on GitHub
☆54Oct 24, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
swiseman / neighbor-splicing
View on GitHub
☆11Jan 2, 2022Updated 4 years ago
anthonywchen / MOCHA
View on GitHub
Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".
☆16May 3, 2022Updated 4 years ago
tingofurro / summac
View on GitHub
Codebase, data and models for the SummaC paper in TACL
☆110Jan 30, 2025Updated last year
sheryc / resonance_rope
View on GitHub
[ACL 24 Findings] Implementation of Resonance RoPE and the PosGen synthetic dataset.
☆24Mar 5, 2024Updated 2 years ago
SimengSun / revisit-nplm
View on GitHub
☆12Sep 1, 2021Updated 4 years ago
google-deepmind / loft
View on GitHub
LOFT: A 1 Million+ Token Long-Context Benchmark
☆237Apr 13, 2026Updated 3 months ago
lilakk / PostMark
View on GitHub
Official repository for "PostMark: A Robust Blackbox Watermark for Large Language Models"
☆29Aug 30, 2024Updated last year
aryopg / mmlu-redux
View on GitHub
☆32Nov 9, 2024Updated last year
nightdessert / Retrieval_Head
View on GitHub
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆241Aug 2, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
KwanWaiChung / M4LE
View on GitHub
Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
☆23Jul 27, 2024Updated last year
Liyan06 / AggreFact
View on GitHub
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors (ACL 2023)
☆28Mar 26, 2024Updated 2 years ago
shmsw25 / FActScore
View on GitHub
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆450Apr 13, 2025Updated last year
shtoshni / g2p
View on GitHub
Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models
☆15Feb 20, 2019Updated 7 years ago
tencent-ailab / Lodoss
View on GitHub
☆33May 16, 2023Updated 3 years ago
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
vaguenebula / AlpacaDataReflect
View on GitHub
An experiment to see if chatgpt can improve the output of the stanford alpaca dataset
☆12Mar 29, 2023Updated 3 years ago
explosion / curated-tokenizers
View on GitHub
Lightweight piece tokenization library
☆12Apr 15, 2024Updated 2 years ago
cognitivetech / ollama-ebook-summary
View on GitHub
LLM for Long Text Summary (Comprehensive Bulleted Notes)
☆624Jul 5, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HarlynDN / WebCiteS
View on GitHub
[ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
☆13Sep 11, 2024Updated last year
princeton-nlp / AutoCompressors
View on GitHub
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆337Sep 9, 2024Updated last year
tencent-ailab / OASum
View on GitHub
☆15Oct 20, 2023Updated 2 years ago
princeton-nlp / HELMET
View on GitHub
The HELMET Benchmark
☆220Apr 17, 2026Updated 3 months ago
armingh2000 / FactScoreLite
View on GitHub
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu…
☆14Apr 25, 2024Updated 2 years ago
OpenBMB / InfiniteBench
View on GitHub
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
☆387Sep 25, 2024Updated last year
bigai-nlco / LooGLE
View on GitHub
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
☆199Oct 8, 2024Updated last year