dell-research-harvard / NEWS-COPYLinks
Noise-robust de-duplication at scale
☆20Updated 2 years ago
Alternatives and similar repositories for NEWS-COPY
Users that are interested in NEWS-COPY are comparing it to the libraries listed below
Sorting:
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆12Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 2 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated 2 years ago
- [COLING 2022]: CommunityLM: Probing Partisan Worldviews from Language Models☆14Updated 2 years ago
- ☆14Updated last month
- Code and data for "Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words"☆16Updated 3 years ago
- Package to extract connotation frames☆85Updated last year
- ☆13Updated 3 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆54Updated last year
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆90Updated 2 years ago
- ☆10Updated 9 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated 2 years ago
- Evaluate language models using multiple choice items☆13Updated 2 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆21Updated 2 weeks ago
- ☆22Updated 5 months ago
- INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.☆24Updated last year
- ☆66Updated last year
- MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer☆37Updated 3 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- A software for transferring pre-trained English models to foreign languages☆18Updated 2 years ago
- TimeLMs: Diachronic Language Models from Twitter☆108Updated last year
- ☆27Updated 4 months ago
- MultiCite code and data. Models are available on Huggingface.☆32Updated 3 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆29Updated 3 years ago
- PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations…☆19Updated 2 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆82Updated 10 months ago
- ☆16Updated 7 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Updated 11 months ago