webis-de / webis-tldr-17-corpusLinks
Code for constructing TLDR corpus from Reddit dataset
β25Updated 3 years ago
Alternatives and similar repositories for webis-tldr-17-corpus
Users that are interested in webis-tldr-17-corpus are comparing it to the libraries listed below
Sorting:
- π« A spaCy package for Yohei Tamura's Rust tokenizations libraryβ29Updated last year
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporatedβ¦β26Updated 2 years ago
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)β41Updated 3 years ago
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.β37Updated 3 years ago
- NewsQuizQA is a quiz-style question-answer dataset used for generating quiz questions about the newsβ34Updated 4 years ago
- MultiCite code and data. Models are available on Huggingface.β32Updated 3 years ago
- Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statementβ¦β16Updated 3 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchiβ¦β33Updated last year
- Source codes for the paper "Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints"β28Updated 2 years ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extractionβ24Updated 3 years ago
- StAtutory Reasoning Assessmentβ13Updated 2 years ago
- Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)β44Updated 3 years ago
- β90Updated 2 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.β44Updated last year
- Schema2QA Question Answering Datasetβ18Updated 2 years ago
- WinoGrande: An Adversarial Winograd Schema Challenge at Scaleβ95Updated 5 years ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- The Universal Anaphora Scorerβ15Updated 9 months ago
- β33Updated 2 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformationsβ55Updated 2 years ago
- β78Updated last year
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal β¦β32Updated 4 years ago
- Code for Stage-wise Fine-tuning for Graph-to-Text Generationβ26Updated 2 years ago
- GrammarTagger β A Neural Multilingual Grammar Profiler for Language Learningβ27Updated 4 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' puβ¦β40Updated 3 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engineβ31Updated 3 years ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.β71Updated 2 years ago
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".β16Updated 3 years ago
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an β¦β32Updated last year
- β24Updated 9 months ago