dell-research-harvard / NEWS-COPYLinks
Noise-robust de-duplication at scale
☆20Updated 2 years ago
Alternatives and similar repositories for NEWS-COPY
Users that are interested in NEWS-COPY are comparing it to the libraries listed below
Sorting:
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆90Updated 2 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆12Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 2 years ago
- ☆27Updated 5 months ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆54Updated last year
- An easy-to-use API for analyzing INCEpTION annotation projects.☆17Updated last year
- ☆13Updated 3 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆21Updated last month
- [COLING 2022]: CommunityLM: Probing Partisan Worldviews from Language Models☆14Updated 2 years ago
- This repository provides the source code used to automatically generate the book summarization datasets described in the paper titled "Ec…☆11Updated 3 months ago
- The multilingual language model for Switzerland☆27Updated last year
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆41Updated 3 years ago
- ☆10Updated 10 months ago
- ☆27Updated 3 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆25Updated 8 months ago
- A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering☆44Updated 3 years ago
- Code for "Dynamic Contextualized Word Embeddings"☆31Updated 3 years ago
- Code and data for "Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words"☆16Updated 3 years ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆69Updated 2 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- ☆14Updated 2 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- ☆66Updated 2 years ago
- multimodal document analysis☆165Updated last year
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Updated 2 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆87Updated 2 months ago
- Automatically detect errors in annotated corpora.☆47Updated last year
- ☆9Updated last year
- A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, …☆92Updated last year
- A software for transferring pre-trained English models to foreign languages☆18Updated 2 years ago