webis-de / webis-tldr-17-corpusLinks
Code for constructing TLDR corpus from Reddit dataset
β27Updated 4 years ago
Alternatives and similar repositories for webis-tldr-17-corpus
Users that are interested in webis-tldr-17-corpus are comparing it to the libraries listed below
Sorting:
- π€ Disaggregators: Curated data labelers for in-depth analysis.β67Updated 2 years ago
- A dataset for pretraining language models targeted for legal tasks.β140Updated 3 years ago
- β92Updated 3 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchiβ¦β35Updated last year
- The AI Knowledge Editorβ186Updated 3 years ago
- multimodal document analysisβ166Updated 3 weeks ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.β12Updated 2 years ago
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.β37Updated 3 years ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Modelsβ85Updated 2 years ago
- This is the code for loading the SenseBERT model, described in our paper from ACL 2020.β46Updated 2 years ago
- β44Updated last year
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 qβ¦β89Updated last year
- GenieNLP: A versatile codebase for any NLP taskβ88Updated last year
- Code for Stage-wise Fine-tuning for Graph-to-Text Generationβ26Updated 2 years ago
- Pipeline for pulling and processing online language model pretraining data from the webβ178Updated 2 years ago
- Developing tools to automatically analyze datasetsβ75Updated last year
- Documentation effort for the BookCorpus datasetβ34Updated 4 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ32Updated 2 years ago
- Tools for managing datasets for governance and training.β87Updated 2 weeks ago
- LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Developmentβ20Updated 2 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal β¦β32Updated 4 years ago
- β78Updated 2 years ago
- Embedding Recycling for Language modelsβ38Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engineβ31Updated 3 years ago
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.β125Updated last year
- BLOOM+1: Adapting BLOOM model to support a new unseen languageβ74Updated last year
- Intelligence Task Ontology (ITO)β75Updated 3 years ago
- β14Updated last year
- One stop shop for all things carpβ59Updated 3 years ago
- TimeLMs: Diachronic Language Models from Twitterβ111Updated last year