☆17Dec 11, 2024Updated last year
Alternatives and similar repositories for ClueWeb22
Users that are interested in ClueWeb22 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Web archiving utility library☆11Mar 11, 2026Updated last week
- TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.☆15Aug 31, 2020Updated 5 years ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆31Jul 9, 2024Updated last year
- ☆16Mar 25, 2022Updated 3 years ago
- ☆17Jul 18, 2022Updated 3 years ago
- [ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…☆39Jun 30, 2024Updated last year
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 3 months ago
- A robust web archive analytics toolkit☆134Oct 15, 2025Updated 5 months ago
- Portal Tutorial☆11Feb 3, 2018Updated 8 years ago
- ☆24Oct 23, 2020Updated 5 years ago
- Zunda: Japanese Enhanced Modality Analyzer client for Python.☆10Nov 30, 2019Updated 6 years ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆346Dec 16, 2024Updated last year
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆15Jul 24, 2023Updated 2 years ago
- Assignments for AML course @ UvA. Fall 2017☆14Nov 22, 2017Updated 8 years ago
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated 11 months ago
- ☆13Feb 5, 2022Updated 4 years ago
- Toolkit for domain-specific information retrieval experimentation☆19Feb 24, 2026Updated 3 weeks ago
- Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"☆25Sep 18, 2016Updated 9 years ago
- ☆12May 17, 2022Updated 3 years ago
- A library for open domain query facet extraction and generation☆16Apr 24, 2024Updated last year
- ☆39Jul 25, 2024Updated last year
- ☆15Jun 9, 2018Updated 7 years ago
- ☆13Dec 21, 2021Updated 4 years ago
- ☆10Jan 12, 2018Updated 8 years ago
- AllenNLP integration for Shiba: Japanese CANINE model☆12Jun 26, 2021Updated 4 years ago
- [EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation☆14Aug 20, 2025Updated 7 months ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Updated this week
- WebConf 2020 paper Leading Conversational Search by Suggesting Useful Questions☆33May 4, 2020Updated 5 years ago
- ☆18Mar 23, 2025Updated last year
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- facebook link prediction kaggle challenge.☆15Aug 10, 2014Updated 11 years ago
- Trials of pre-trained BERT models for the medical domain in Japanese.☆12Nov 21, 2020Updated 5 years ago
- ☆15Oct 10, 2021Updated 4 years ago
- Tools for the TREC CAsT benchmark☆30Dec 15, 2022Updated 3 years ago
- ☆13Jan 20, 2023Updated 3 years ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆229Aug 28, 2024Updated last year
- A toolkit for end-to-end neural ad hoc retrieval☆97Aug 20, 2024Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year