☆17Dec 11, 2024Updated last year
Alternatives and similar repositories for ClueWeb22
Users that are interested in ClueWeb22 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Web archiving utility library☆11Mar 11, 2026Updated last month
- TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.☆15Aug 31, 2020Updated 5 years ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆31Jul 9, 2024Updated last year
- ☆16Mar 25, 2022Updated 4 years ago
- ☆17Jul 18, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 5 months ago
- A robust web archive analytics toolkit☆136Apr 24, 2026Updated last week
- Portal Tutorial☆11Feb 3, 2018Updated 8 years ago
- ☆24Oct 23, 2020Updated 5 years ago
- This repo contains the code for Late Prompt Tuning.☆12Dec 22, 2025Updated 4 months ago
- Zunda: Japanese Enhanced Modality Analyzer client for Python.☆10Nov 30, 2019Updated 6 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- Fusion for TREC run files with popular fusion techniques☆21Aug 26, 2022Updated 3 years ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆349Dec 16, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆15Jul 24, 2023Updated 2 years ago
- Assignments for AML course @ UvA. Fall 2017☆14Nov 22, 2017Updated 8 years ago
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated last year
- A PyTorch toolbox for domain adaptation, domain generalization, federated learning DA/DG, active learning DA/DG, ALDG and semi-supervised…☆11Jan 10, 2022Updated 4 years ago
- ☆13Feb 5, 2022Updated 4 years ago
- Toolkit for domain-specific information retrieval experimentation☆19Apr 11, 2026Updated 3 weeks ago
- Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"☆25Sep 18, 2016Updated 9 years ago
- ☆12May 17, 2022Updated 3 years ago
- A library for open domain query facet extraction and generation☆16Apr 24, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆39Jul 25, 2024Updated last year
- AllenNLP integration for Shiba: Japanese CANINE model☆12Jun 26, 2021Updated 4 years ago
- INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions☆16Jan 21, 2025Updated last year
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Apr 23, 2026Updated last week
- WebConf 2020 paper Leading Conversational Search by Suggesting Useful Questions☆33May 4, 2020Updated 5 years ago
- ☆19Mar 23, 2025Updated last year
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- facebook link prediction kaggle challenge.☆15Aug 10, 2014Updated 11 years ago
- ☆15Oct 10, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆13Jan 20, 2023Updated 3 years ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆229Aug 28, 2024Updated last year
- A toolkit for end-to-end neural ad hoc retrieval☆97Aug 20, 2024Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- docker for UTH-BERT: https://ai-health.m.u-tokyo.ac.jp/uth-bert☆14Mar 24, 2023Updated 3 years ago
- code for Preprint paper at Arxiv: MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts☆24Nov 29, 2023Updated 2 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆46Sep 22, 2020Updated 5 years ago