☆16Dec 11, 2024Updated last year
Alternatives and similar repositories for ClueWeb22
Users that are interested in ClueWeb22 are comparing it to the libraries listed below
Sorting:
- Web archiving utility library☆11Dec 3, 2025Updated 2 months ago
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 3 months ago
- ☆17Jul 18, 2022Updated 3 years ago
- TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.☆15Aug 31, 2020Updated 5 years ago
- ☆16Mar 25, 2022Updated 3 years ago
- Toolkit for domain-specific information retrieval experimentation☆19Updated this week
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated last year
- Fusion for TREC run files with popular fusion techniques☆21Aug 26, 2022Updated 3 years ago
- ☆24Oct 23, 2020Updated 5 years ago
- Information Retrieval Relevance Judging System☆29Jan 17, 2022Updated 4 years ago
- Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"☆25Sep 18, 2016Updated 9 years ago
- A robust web archive analytics toolkit☆132Oct 15, 2025Updated 4 months ago
- Tools for the TREC CAsT benchmark☆28Dec 15, 2022Updated 3 years ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Updated this week
- ☆39Jul 25, 2024Updated last year
- Indri search implementation on top of Lucene search engine☆35Mar 12, 2024Updated last year
- Common Index File Format to to support interoperability between open-source IR engines☆40Sep 19, 2024Updated last year
- WebConf 2020 paper Leading Conversational Search by Suggesting Useful Questions☆33May 4, 2020Updated 5 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- Common Crawl fork of Apache Nutch☆40Updated this week
- A toolkit for end-to-end neural ad hoc retrieval☆97Aug 20, 2024Updated last year
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆346Dec 16, 2024Updated last year
- Code and data for the Walert large language model-based chatbot☆12Aug 14, 2025Updated 6 months ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- Containerfile for the Vanilla OS Desktop+Nvidia image.☆16Feb 5, 2026Updated 3 weeks ago
- ☆10Jul 6, 2023Updated 2 years ago
- C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs☆11Jan 13, 2023Updated 3 years ago
- Streamline on-policy/off-policy distillation workflows in a few lines of code☆95Updated this week
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated 11 months ago
- A Python Interface to Reproducibility Measures of System-Oriented IR Experiments☆11Dec 2, 2025Updated 3 months ago
- ☆10Jan 5, 2022Updated 4 years ago
- ☆13Jul 13, 2023Updated 2 years ago
- Via Text Density Simple Web Crawler With Go☆13Mar 19, 2023Updated 2 years ago
- A monolithic index that supports worst-case optimal joins (WCOJ) by providing all collation orders in a single redundancy eliminating dat…☆16Sep 18, 2025Updated 5 months ago
- this is a work about UpliftRec☆10Dec 10, 2024Updated last year
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- Fair Benchmarks☆10Mar 14, 2019Updated 6 years ago