lemurproject/ClueWeb22

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lemurproject/ClueWeb22)

lemurproject / ClueWeb22

☆17

Alternatives and similar repositories for ClueWeb22

Users that are interested in ClueWeb22 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

leogao2 / commoncrawl_downloader
View on GitHub
☆33May 23, 2023Updated 3 years ago
commoncrawl / ia-web-commons
View on GitHub
Web archiving utility library
☆11Jun 19, 2026Updated last month
castorini / TREC-COVID
View on GitHub
TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.
☆15Aug 31, 2020Updated 5 years ago
JackHCC / Embedded-Microprocessor-System-Homework
View on GitHub
Peking University Embedded Microprocessor System Lesson’s all Homework
☆10Dec 28, 2021Updated 4 years ago
jiquan123 / TIER
View on GitHub
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
☆10Mar 1, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
princeton-nlp / PTP
View on GitHub
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
☆32Jul 9, 2024Updated 2 years ago
EleutherAI / pile-cc
View on GitHub
☆16Mar 25, 2022Updated 4 years ago
OpenMatch / TASTE
View on GitHub
[CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…
☆40Mar 17, 2024Updated 2 years ago
Lurunchik / NF-CATS
View on GitHub
☆17Jul 18, 2022Updated 4 years ago
jiquan123 / I2IQA
View on GitHub
PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images
☆15Dec 4, 2024Updated last year
OpenMatch / MARVEL
View on GitHub
[ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…
☆39Jun 30, 2024Updated 2 years ago
NEUIR / ConAE
View on GitHub
[EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…
☆13Oct 20, 2022Updated 3 years ago
tberg12 / cse291spr21
View on GitHub
☆10Jun 9, 2021Updated 5 years ago
chatnoir-eu / chatnoir-resiliparse
View on GitHub
A robust web archive analytics toolkit
☆144Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
nelson-liu / website
View on GitHub
☆13Feb 5, 2022Updated 4 years ago
thunlp / ConvDR
View on GitHub
Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"
☆43Dec 9, 2021Updated 4 years ago
WorksApplications / chikkar
View on GitHub
Japanese synonym library
☆11Apr 18, 2022Updated 4 years ago
xyltt / LPT
View on GitHub
This repo contains the code for Late Prompt Tuning.
☆12Dec 22, 2025Updated 7 months ago
ikegami-yukino / zunda-python
View on GitHub
Zunda: Japanese Enhanced Modality Analyzer client for Python.
☆10Nov 30, 2019Updated 6 years ago
Georgetown-IR-Lab / covid-neural-ir
View on GitHub
☆24Oct 23, 2020Updated 5 years ago
INK-USC / FiD-ICL
View on GitHub
"FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)
☆15Jul 24, 2023Updated 2 years ago
microsoft / MS-MARCO-Web-Search
View on GitHub
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆351Dec 16, 2024Updated last year
rmit-ir / polyfuse
View on GitHub
Fusion for TREC run files with popular fusion techniques
☆21Aug 26, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ryanhlewis / GPT-Auto-Scraper
View on GitHub
A GPT-powered AI auto scraper for websites. AI Web Scraping made easy.
☆13Jun 26, 2023Updated 3 years ago
nickvosk / sigir2020-query-resolution
View on GitHub
☆13Jul 25, 2024Updated last year
ISosnovik / UVA_AML17
View on GitHub
Assignments for AML course @ UvA. Fall 2017
☆13Nov 22, 2017Updated 8 years ago
MaXuSun / domainext
View on GitHub
A PyTorch toolbox for domain adaptation, domain generalization, federated learning DA/DG, active learning DA/DG, ALDG and semi-supervised…
☆11Jan 10, 2022Updated 4 years ago
hscells / pybool_ir
View on GitHub
Toolkit for domain-specific information retrieval experimentation
☆19May 18, 2026Updated 2 months ago
Relento / hypothesis_search
View on GitHub
☆22Nov 26, 2024Updated last year
yuhongqian / ANCE-PRF
View on GitHub
☆12May 17, 2022Updated 4 years ago
laura-dietz / tutorial-kb4ir
View on GitHub
Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"
☆25Sep 18, 2016Updated 9 years ago
JamesSand / UsefulCommands
View on GitHub
Lifelong Learning Note
☆16Jun 2, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
algoprog / Faspect
View on GitHub
A library for open domain query facet extraction and generation
☆16Apr 24, 2024Updated 2 years ago
microsoft / Efficient-Large-LM-Trainer
View on GitHub
☆39Jul 25, 2024Updated last year
akirakubo / mecab-mozcdic
View on GitHub
☆10Jan 12, 2018Updated 8 years ago
ujiuji1259 / uke_japanese
View on GitHub
☆13Dec 21, 2021Updated 4 years ago
jinseikenai / uth-bert
View on GitHub
Pre-processing text and tokenization for UTH-BERT
☆10Sep 30, 2020Updated 5 years ago
OpenMatch / SANTA
View on GitHub
☆12Jul 13, 2023Updated 3 years ago
shunk031 / allennlp-shiba-model
View on GitHub
AllenNLP integration for Shiba: Japanese CANINE model
☆12Jun 26, 2021Updated 5 years ago