JohnGiorgi/DeCLUTR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JohnGiorgi/DeCLUTR)

JohnGiorgi / DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

☆377

Alternatives and similar repositories for DeCLUTR

Users that are interested in DeCLUTR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

seonghyeonye / EfficientCL
View on GitHub
[EMNLP 2021] Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
☆17Jun 28, 2025Updated last year
amazon-science / sentence-representations
View on GitHub
☆79Jul 11, 2022Updated 4 years ago
facebookresearch / SentEval
View on GitHub
A python tool for evaluating the quality of sentence embeddings.
☆2,110Mar 19, 2024Updated 2 years ago
princeton-nlp / DensePhrases
View on GitHub
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…
☆607Jun 15, 2022Updated 4 years ago
princeton-nlp / SimCSE
View on GitHub
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
☆3,655Oct 16, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
FreddeFrallan / Contrastive-Tension
View on GitHub
State of the art Semantic Sentence Embeddings
☆100May 22, 2022Updated 4 years ago
rrmenon10 / ADAPET
View on GitHub
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training
☆152Jun 10, 2022Updated 4 years ago
MichaelZhouwang / Sequence_Span_Rewriting
View on GitHub
Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
☆17Nov 30, 2021Updated 4 years ago
timoschick / pet
View on GitHub
This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
☆1,625Jun 12, 2023Updated 3 years ago
yym6472 / ConSERT
View on GitHub
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
☆542Dec 10, 2021Updated 4 years ago
luyug / Condenser
View on GitHub
EMNLP 2021 - Pre-training architectures for dense retrieval
☆256Mar 18, 2022Updated 4 years ago
studio-ousia / bpr
View on GitHub
Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering
☆175Jun 6, 2021Updated 5 years ago
princeton-nlp / LM-BFF
View on GitHub
[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
☆727Aug 29, 2022Updated 3 years ago
allenai / dont-stop-pretraining
View on GitHub
Code associated with the Don't Stop Pretraining ACL 2020 paper
☆543Nov 15, 2021Updated 4 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
MurtyShikhar / ExpBERT
View on GitHub
Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"
☆29Jun 15, 2020Updated 6 years ago
allenai / specter
View on GitHub
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆586Jun 12, 2023Updated 3 years ago
facebookresearch / anli
View on GitHub
Adversarial Natural Language Inference Benchmark
☆402May 12, 2022Updated 4 years ago
facebookresearch / KILT
View on GitHub
Library for Knowledge Intensive Language Tasks
☆978Mar 31, 2022Updated 4 years ago
xbdxwyh / mocose
View on GitHub
☆11Feb 14, 2023Updated 3 years ago
princeton-nlp / OptiPrompt
View on GitHub
[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240
☆168Oct 7, 2022Updated 3 years ago
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
richarddwang / electra_pytorch
View on GitHub
Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
☆332Jan 10, 2024Updated 2 years ago
voidism / DiffCSE
View on GitHub
Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"
☆297Jul 12, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,923Updated this week
castorini / docTTTTTquery
View on GitHub
docTTTTTquery document expansion model
☆377Mar 25, 2023Updated 3 years ago
princeton-nlp / MADE
View on GitHub
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
☆68Nov 26, 2021Updated 4 years ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,772May 26, 2026Updated last month
google-research / xtreme
View on GitHub
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…
☆651Jan 4, 2023Updated 3 years ago
QData / TextAttack
View on GitHub
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs…
☆3,450Apr 17, 2026Updated 3 months ago
bigscience-workshop / t-zero
View on GitHub
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
☆463Nov 5, 2022Updated 3 years ago
microsoft / ANCE
View on GitHub
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
☆385Jan 6, 2026Updated 6 months ago
PathwayCommons / semantic-search
View on GitHub
A simple semantic search engine for scientific papers.
☆28Sep 14, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / COCO-LM
View on GitHub
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
☆118Jul 25, 2023Updated 2 years ago
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
kandorm / CLINE
View on GitHub
Lexically Error Correction BERT.
☆49Jun 20, 2021Updated 5 years ago
amzn / trans-encoder
View on GitHub
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
☆133Jul 2, 2026Updated 2 weeks ago
allenai / longformer
View on GitHub
Longformer: The Long-Document Transformer
☆2,201Feb 8, 2023Updated 3 years ago
MilaNLProc / contextualized-topic-models
View on GitHub
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…
☆1,272Jul 24, 2025Updated 11 months ago
allenai / naacl2021-longdoc-tutorial
View on GitHub
☆343Aug 3, 2021Updated 4 years ago