dell-research-harvard / NEWS-COPYLinks
Noise-robust de-duplication at scale
☆19Updated 2 years ago
Alternatives and similar repositories for NEWS-COPY
Users that are interested in NEWS-COPY are comparing it to the libraries listed below
Sorting:
- Learning from Neighbors: Unsupervised Text Classification☆17Updated 2 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆12Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 2 years ago
- Package to extract connotation frames☆85Updated last year
- MultiCite code and data. Models are available on Huggingface.☆32Updated 3 years ago
- Data for the HIPE 2022 shared task.☆18Updated last year
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated last year
- Compass-aligned Distributional Embeddings. Align embeddings from different corpora☆39Updated 2 years ago
- ☆16Updated 5 months ago
- Code for the paper "Modeling Information Change in Science Communication with Semantically Matched Paraphrases" from EMNLP 2022☆13Updated 2 years ago
- ☆24Updated 2 years ago
- Code for "Dynamic Contextualized Word Embeddings"☆31Updated 3 years ago
- ☆27Updated 3 years ago
- This repository provides the source code used to automatically generate the book summarization datasets described in the paper titled "Ec…☆11Updated 2 months ago
- ☆53Updated last year
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆90Updated 2 years ago
- ☆13Updated 3 years ago
- An easy-to-use API for analyzing INCEpTION annotation projects.☆17Updated last year
- MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer☆37Updated 3 years ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆27Updated 9 months ago
- ☆48Updated 3 weeks ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- Code for the paper "Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora", ACL 2020.☆18Updated 4 years ago
- ☆22Updated 4 months ago
- Repository for the paper Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions☆17Updated last year
- ParaNames: A multilingual resource for parallel names☆34Updated last year
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated 2 years ago
- ☆21Updated last year
- Neural Language Models for Historical Research☆26Updated 8 months ago
- A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, …☆85Updated last year