ict-bigdatalab/awesome-pretrained-models-for-information-retrieval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ict-bigdatalab/awesome-pretrained-models-for-information-retrieval)

ict-bigdatalab / awesome-pretrained-models-for-information-retrieval

A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).

☆677

Alternatives and similar repositories for awesome-pretrained-models-for-information-retrieval

Users that are interested in awesome-pretrained-models-for-information-retrieval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Albert-Ma / PROP
View on GitHub
WSDM'2021, PROP and SIGIR'2021,B-PROP
☆110May 18, 2023Updated 3 years ago
caiyinqiong / Semantic-Retrieval-Models
View on GitHub
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Re…
☆341Jun 17, 2023Updated 3 years ago
Albert-Ma / COSTA
View on GitHub
SIGIR'2022, Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction
☆27Nov 8, 2022Updated 3 years ago
gabriben / awesome-generative-information-retrieval
View on GitHub
☆728Oct 7, 2025Updated 9 months ago
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆743Jul 18, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,104Jul 16, 2026Updated last week
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,255Oct 16, 2025Updated 9 months ago
RUCAIBox / DenseRetrieval
View on GitHub
☆220Dec 7, 2022Updated 3 years ago
luyug / Condenser
View on GitHub
EMNLP 2021 - Pre-training architectures for dense retrieval
☆256Mar 18, 2022Updated 4 years ago
RUC-NLPIR / LLM4IR-Survey
View on GitHub
This is the repo for the survey of LLM4IR.
☆540Nov 13, 2025Updated 8 months ago
castorini / pygaggle
View on GitHub
a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
☆354Dec 21, 2023Updated 2 years ago
thunlp / OpenMatch
View on GitHub
An Open-Source Package for Information Retrieval.
☆442Oct 7, 2022Updated 3 years ago
Chriskuei / awesome-generative-retrieval-models
View on GitHub
A curated list of awesome papers related to generative retrieval models.
☆54May 31, 2023Updated 3 years ago
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated 2 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
zhengyima / Anchors
View on GitHub
Source code of CIKM2021 Paper 'Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need'
☆16Aug 30, 2021Updated 4 years ago
microsoft / AR2
View on GitHub
☆71Jun 16, 2022Updated 4 years ago
DI4IR / SIGIR2021
View on GitHub
☆24Jun 28, 2023Updated 3 years ago
luyug / COIL
View on GitHub
NAACL2021 - COIL Contextualized Lexical Retriever
☆158Jul 27, 2021Updated 5 years ago
facebookresearch / DPR
View on GitHub
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
☆1,870Apr 6, 2023Updated 3 years ago
castorini / rank_llm
View on GitHub
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆611Jul 19, 2026Updated last week
microsoft / ANCE
View on GitHub
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
☆386Jan 6, 2026Updated 6 months ago
ict-bigdatalab / CorpusBrain
View on GitHub
CIKM 2022: CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks
☆34Aug 31, 2022Updated 3 years ago
sebastian-hofstaetter / matchmaker
View on GitHub
Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch
☆265Jan 27, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
FreedomIntelligence / DPTDR
View on GitHub
Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
☆26Aug 7, 2023Updated 2 years ago
castorini / dhr
View on GitHub
Dense hybrid representations for text retrieval
☆65Apr 3, 2023Updated 3 years ago
ielab / asyncval
View on GitHub
A toolkit for asynchronously validating dense retriever checkpoints during training.
☆27Aug 10, 2023Updated 2 years ago
castorini / anserini
View on GitHub
Anserini is a Lucene toolkit for reproducible information retrieval research
☆1,151Updated this week
sebastian-hofstaetter / teaching
View on GitHub
Open-Source Information Retrieval Courses @ TU Wien
☆706Jun 12, 2023Updated 3 years ago
sunnweiwei / MAIR
View on GitHub
MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]
☆28Nov 3, 2024Updated last year
castorini / docTTTTTquery
View on GitHub
docTTTTTquery document expansion model
☆377Mar 25, 2023Updated 3 years ago
ielab / llm-rankers
View on GitHub
Document Ranking with Large Language Models.
☆210Feb 14, 2026Updated 5 months ago
sunnweiwei / RankGPT
View on GitHub
Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]
☆669Mar 10, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jingtaozhan / DRhard
View on GitHub
SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.
☆127Feb 15, 2022Updated 4 years ago
AdeDZY / DeepCT
View on GitHub
DeepCT and HDCT uses BERT to generate novel, context-aware bag-of-words term weights for documents and queries.
☆325May 9, 2021Updated 5 years ago
facebookresearch / SEAL
View on GitHub
Search Engines with Autoregressive Language models
☆296Apr 4, 2023Updated 3 years ago
stanford-futuredata / ColBERT
View on GitHub
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
☆3,904Oct 14, 2025Updated 9 months ago
luyug / Reranker
View on GitHub
Build Text Rerankers with Deep Language Models
☆265Feb 20, 2024Updated 2 years ago
ArvinZhuang / DSI-QG
View on GitHub
The official repository for "Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation", Shen…
☆129Jul 9, 2023Updated 3 years ago
facebookresearch / contriever
View on GitHub
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
☆780Apr 7, 2023Updated 3 years ago