commoncrawl / ia-web-commons
Web archiving utility library
☆11Updated last month
Alternatives and similar repositories for ia-web-commons:
Users that are interested in ia-web-commons are comparing it to the libraries listed below
- ☆38Updated last year
- ☆97Updated 2 years ago
- Unified Learned Sparse Retrieval Framework☆64Updated 11 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆101Updated last year
- Dense hybrid representations for text retrieval☆62Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆161Updated 6 months ago
- A framework for few-shot evaluation of autoregressive language models.☆104Updated last year
- ☆45Updated 3 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆170Updated 3 years ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆150Updated 11 months ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year
- Repo to hold code and track issues for the collection of permissively licensed data☆23Updated 2 weeks ago
- ☆16Updated 4 months ago
- provides a common interface to many IR measure tools☆84Updated last week
- ☆29Updated last year
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆75Updated 3 years ago
- Retrieval-Augmented Generation battle!☆49Updated 4 months ago
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆71Updated 8 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆218Updated 5 months ago
- Inquisitive Parrots for Search☆190Updated last year
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆45Updated last year
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated last year
- 🚢 Data Toolkit for Sailor Language Models☆88Updated 2 months ago
- ☆86Updated 3 weeks ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated last year
- Scalable training for dense retrieval models.☆292Updated 2 months ago
- A multilingual version of MS MARCO passage ranking dataset☆144Updated last year
- ☆72Updated last year