dell-research-harvard / NEWS-COPYLinks
Noise-robust de-duplication at scale
☆20Updated 2 years ago
Alternatives and similar repositories for NEWS-COPY
Users that are interested in NEWS-COPY are comparing it to the libraries listed below
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 3 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 3 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated 2 years ago
- ☆14Updated last week
- ☆53Updated last year
- Multilingual Open Text☆25Updated 4 months ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆54Updated 2 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆29Updated 3 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆25Updated 9 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- ☆15Updated 7 years ago
- ☆22Updated 8 months ago
- [COLING 2022]: CommunityLM: Probing Partisan Worldviews from Language Models☆14Updated 2 years ago
- Code and data for "Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words"☆16Updated 4 years ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21Updated 2 years ago
- MultiCite code and data. Models are available on Huggingface.☆31Updated 3 years ago
- ☆27Updated 7 months ago
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆49Updated 3 years ago
- PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations…☆22Updated 2 years ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆28Updated last year
- ☆13Updated 3 years ago
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆59Updated last week
- GisPy: A Tool for Measuring Gist Inference Score in Text https://aclanthology.org/2022.wnu-1.5/☆13Updated last year
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learning☆14Updated 3 months ago
- Twitter dataset for 2022 Russian and Ukrainian crisis☆48Updated 2 years ago
- ☆10Updated 11 months ago
- This repository provides the source code used to automatically generate the book summarization datasets described in the paper titled "Ec…☆11Updated 5 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆21Updated 2 months ago
- Automatically detect errors in annotated corpora.☆47Updated 2 years ago