webis-de / webis-tldr-17-corpusLinks

Code for constructing TLDR corpus from Reddit dataset

☆27

Alternatives and similar repositories for webis-tldr-17-corpus

Users that are interested in webis-tldr-17-corpus are comparing it to the libraries listed below

Sorting:

huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆67Updated 2 years ago
Breakend / PileOfLaw
A dataset for pretraining language models targeted for legal tasks.
☆140Updated 3 years ago
EleutherAI / openwebtext2
☆92Updated 3 years ago
allenai / smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…
☆35Updated last year
facebookresearch / side
The AI Knowledge Editor
☆186Updated 3 years ago
allenai / mmda
multimodal document analysis
☆166Updated 3 weeks ago
amazon-science / wqa-multi-sentence-inference
This repository contains code used for our Multi Sentence Inference NAACL'22 paper.
☆12Updated 2 years ago
google-research-datasets / NewSHead
The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.
☆37Updated 3 years ago
EleutherAI / stackexchange-dataset
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆85Updated 2 years ago
AI21Labs / sense-bert
This is the code for loading the SenseBERT model, described in our paper from ACL 2020.
☆46Updated 2 years ago
EleutherAI / semantic-memorization
☆44Updated last year
google-research-datasets / seahorse
Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…
☆89Updated last year
stanford-oval / genienlp
GenieNLP: A versatile codebase for any NLP task
☆88Updated last year
EagleW / Stage-wise-Fine-tuning
Code for Stage-wise Fine-tuning for Graph-to-Text Generation
☆26Updated 2 years ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆178Updated 2 years ago
huggingface / data-measurements-tool
Developing tools to automatically analyze datasets
☆75Updated last year
jackbandy / bookcorpus-datasheet
Documentation effort for the BookCorpus dataset
☆34Updated 4 years ago
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated 2 years ago
bigscience-workshop / data_tooling
Tools for managing datasets for governance and training.
☆87Updated 2 weeks ago
coastalcph / lexlms
LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
☆20Updated 2 years ago
malteos / legal-document-similarity
Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …
☆32Updated 4 years ago
leogao2 / lm_dataformat
☆78Updated 2 years ago
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆38Updated 2 years ago
weaviate / biggraph-wikidata-search-with-weaviate
Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
☆31Updated 3 years ago
johnbumgarner / wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
☆125Updated last year
bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆74Updated last year
OpenBioLink / ITO
Intelligence Task Ontology (ITO)
☆75Updated 3 years ago
lucy3 / whos_filtered
☆14Updated last year
EleutherAI / magiCARP
One stop shop for all things carp
☆59Updated 3 years ago
cardiffnlp / timelms
TimeLMs: Diachronic Language Models from Twitter
☆111Updated last year