commoncrawl / ia-web-commonsLinks
Web archiving utility library
☆11Updated 2 months ago
Alternatives and similar repositories for ia-web-commons
Users that are interested in ia-web-commons are comparing it to the libraries listed below
Sorting:
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆227Updated last year
- The pipeline for the OSCAR corpus☆176Updated 3 months ago
- A framework for few-shot evaluation of autoregressive language models.☆106Updated 2 years ago
- Scalable training for dense retrieval models.☆298Updated 8 months ago
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆179Updated last year
- ☆38Updated last year
- ☆72Updated 2 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆192Updated 7 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆155Updated last year
- A robust web archive analytics toolkit☆129Updated 3 months ago
- DSIR large-scale data selection framework for language model training☆269Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated 3 months ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆175Updated 4 years ago
- ☆16Updated last year
- Search Engines with Autoregressive Language models☆295Updated 2 years ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆136Updated last year
- Repository for analysis and experiments in the BigCode project.☆128Updated last year
- 🚢 Data Toolkit for Sailor Language Models☆95Updated 11 months ago
- INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.☆24Updated 2 years ago
- The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.☆182Updated 3 years ago
- Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)☆465Updated 3 years ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆280Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆72Updated last year
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated 2 years ago
- Inquisitive Parrots for Search☆199Updated 8 months ago
- The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".☆85Updated 2 years ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆183Updated 3 years ago
- A toolkit for building dense retrievers with deep language models.☆64Updated 4 years ago
- ☆146Updated last year