[SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval
☆201Jan 4, 2023Updated 3 years ago
Alternatives and similar repositories for Multi-CPR
Users that are interested in Multi-CPR are comparing it to the libraries listed below
Sorting:
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆162Jul 3, 2023Updated 2 years ago
- “阿里灵杰”问天引擎电商搜索算法赛 第二名。电商领域两阶段文本匹配算法。☆56Jul 28, 2022Updated 3 years ago
- Hybrid List Aware Transformer Reranking☆19Oct 25, 2022Updated 3 years ago
- 天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline,又名 NLP 从入门到 22/2771。☆92Jun 29, 2022Updated 3 years ago
- 🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models…☆783Dec 19, 2023Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆21Jul 31, 2023Updated 2 years ago
- Code and data of the EMNLP 2022 Main Conference paper "Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Nega…☆18Mar 25, 2024Updated last year
- 3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型☆312Oct 11, 2022Updated 3 years ago
- SimCSE在中文任务上的简单实验☆606Aug 7, 2023Updated 2 years ago
- Baseline Systems of DuReader Dataset☆1,166May 26, 2022Updated 3 years ago
- Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.☆727Jan 26, 2026Updated last month
- [EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821☆3,641Oct 16, 2024Updated last year
- QBQTC: 大规模搜索匹配数据集☆86Dec 12, 2021Updated 4 years ago
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,860Apr 6, 2023Updated 2 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆74Jan 4, 2023Updated 3 years ago
- EMNLP 2021 - Pre-training architectures for dense retrieval☆256Mar 18, 2022Updated 3 years ago
- 中文数据集下SimCSE+ESimCSE的实现☆191May 21, 2022Updated 3 years ago
- SIGIR 2021: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling☆60Jul 11, 2021Updated 4 years ago
- CCL2024 Chinese Essay Rhetoric Recognition and Understanding☆17Oct 1, 2024Updated last year
- ☆880May 24, 2024Updated last year
- ☆217Dec 7, 2022Updated 3 years ago
- 中文机器阅读理解数据集☆109Mar 29, 2021Updated 4 years ago
- text embedding☆147Sep 18, 2023Updated 2 years ago
- unified embedding model☆876Sep 1, 2023Updated 2 years ago
- Fine-grained Entity Typing / Fine-grained Entity Classification☆12Apr 19, 2018Updated 7 years ago
- ☆16Jul 29, 2022Updated 3 years ago
- MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…☆565Jun 9, 2023Updated 2 years ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Apr 25, 2022Updated 3 years ago
- FewCLUE 小样本学习测评基准,中文版☆518Sep 21, 2022Updated 3 years ago
- 句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。☆98Oct 29, 2022Updated 3 years ago
- A multilingual version of MS MARCO passage ranking dataset☆147Oct 19, 2023Updated 2 years ago
- YuLan-IR: Information Retrieval Boosted LMs☆220Mar 4, 2024Updated last year
- ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion☆37Jul 25, 2024Updated last year
- A toolkit for asynchronously validating dense retriever checkpoints during training.☆27Aug 10, 2023Updated 2 years ago
- Data for paper "CC-Riddle: A Question Answering Dataset of Chinese Character Riddles": https://arxiv.org/abs/2206.13778☆20Aug 19, 2023Updated 2 years ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆339Jun 12, 2023Updated 2 years ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,087Oct 16, 2025Updated 4 months ago
- 一个基于预训练的句向量生成工具☆138Mar 30, 2023Updated 2 years ago