hieudx149 / X-RetroMAE

Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder

☆10

Alternatives and similar repositories for X-RetroMAE

Users that are interested in X-RetroMAE are comparing it to the libraries listed below

Sorting:

hooman650 / SupCL-Seq
Supervised Contrastive Learning for Downstream Optimized Sequence Representations
☆27Updated 3 years ago
JetRunner / LaPraDoR
🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…
☆49Updated 3 years ago
oriram / spider
☆54Updated 2 years ago
castorini / dhr
Dense hybrid representations for text retrieval
☆62Updated 2 years ago
wzhouad / NLL-IE
Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021
☆55Updated 3 years ago
dmis-lab / TouR
Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval
☆30Updated last year
FreedomIntelligence / DPTDR
Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
☆25Updated last year
oriyor / turning_tables
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Updated 3 years ago
ielab / PromptReps
Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
☆47Updated 6 months ago
izhx / uni-rep
Code for embedding and retrieval research.
☆16Updated last year
caskcsg / ir
ConTextual Mask Auto-Encoder for Dense Passage Retrieval
☆35Updated 6 months ago
manueldeprada / Pretraining-T5-PyTorch-Lightning
Collection of scripts to pretrain T5 in unsupervised text, using PyTorch Lightning. CORD-19 pretraining provided as example.
☆32Updated 4 years ago
OpenMatch / COCO-DR
[EMNLP 2022] This is the code repo for our EMNLP‘22 paper "COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contr…
☆49Updated last year
lyutyuh / structured-span-selector
A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…
☆21Updated 2 years ago
thunlp / ConvDR
Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"
☆41Updated 3 years ago
yxuansu / Contrastive_Search_versus_Contrastive_Decoding
An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation
☆27Updated 11 months ago
sean0042 / Open_WikiTable
Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table
☆23Updated last year
sunlab-osu / ReasonBERT
Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021
☆29Updated 2 years ago
swj0419 / kNN_prompt
TBC
☆27Updated 2 years ago
OpenMatch / ANCE-Tele
Code and data of the EMNLP 2022 Main Conference paper "Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Nega…
☆18Updated last year
castorini / mr.tydi
Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.
☆75Updated 3 years ago
yuzhaouoe / pretraining-data-packing
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆21Updated 9 months ago
gmftbyGMFTBY / Rep-Dropout
[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
☆30Updated last year
andrejmiscic / simcls-pytorch
PyTorch reimplementation of the paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization"
☆16Updated 3 years ago
FUZHIYI / TACO
Code for the ACL 2022 paper "Contextual Representation Learning beyond Masked Language Modeling"
☆34Updated 2 years ago
uds-lsv / TOKEN-is-a-MASK
Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"
☆14Updated 2 years ago
yixinL7 / SumLLM
Repo for "On Learning to Summarize with Large Language Models as References"
☆44Updated last year
project-miracl / hagrid
A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution
☆31Updated last year
jingtaozhan / disentangled-retriever
An easy-to-use python toolkit for flexibly adapting various neural ranking models to target domain.
☆59Updated 2 years ago
KoreaMGLEE / Concept-based-curriculum-masking
Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking
☆13Updated 2 years ago