Web archiving utility library
☆11Mar 11, 2026Updated last week
Alternatives and similar repositories for ia-web-commons
Users that are interested in ia-web-commons are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Dec 11, 2024Updated last year
- Natural language detection, Java bindings for CLD2☆17Feb 26, 2026Updated 3 weeks ago
- a simple variational auto encoder with some exploration☆12Nov 22, 2024Updated last year
- A robust web archive analytics toolkit☆134Oct 15, 2025Updated 5 months ago
- Live survey of off-the-shelf language identification tools for python☆27Apr 13, 2022Updated 3 years ago
- Pure CadQuery models of various electronic boards and components.☆19Jul 22, 2025Updated 8 months ago
- ☆12Sep 1, 2024Updated last year
- Portal Tutorial☆11Feb 3, 2018Updated 8 years ago
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 4 months ago
- Common Crawl fork of Apache Nutch☆40Mar 12, 2026Updated last week
- ☆17Jul 9, 2024Updated last year
- Source code for COLING 2022 paper "Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models"☆24Sep 21, 2022Updated 3 years ago
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Gener…☆60Jul 12, 2024Updated last year
- facebook link prediction kaggle challenge.☆15Aug 10, 2014Updated 11 years ago
- ☆15Oct 10, 2021Updated 4 years ago
- An Apache 2.0 fork of HuggingFace's Large Language Model Text Generation Inference☆19Mar 10, 2024Updated 2 years ago
- ☆13Jan 20, 2023Updated 3 years ago
- Automancer is a software application that enables researchers to design, automate, and manage their experiments.☆26Jul 26, 2023Updated 2 years ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Jun 12, 2023Updated 2 years ago
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆34Sep 19, 2025Updated 6 months ago
- ☆38Apr 17, 2024Updated last year
- Query FHIR apis with SQL, for analytics and ML.☆25Jun 15, 2021Updated 4 years ago
- Group-conditional DRO to alleviate spurious correlations☆15Jul 15, 2021Updated 4 years ago
- Data and preprocessing scripts for SemEval 2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding☆14Feb 3, 2022Updated 4 years ago
- Index of URLs to pdf files all over the internet and scripts☆25May 2, 2023Updated 2 years ago
- ☆22Aug 31, 2021Updated 4 years ago
- [EMNLP 2023 Findings] Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt☆20Nov 2, 2023Updated 2 years ago
- A simple wrapper for lmdb. Support dict-like operations.☆23Apr 20, 2023Updated 2 years ago
- TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.☆15Aug 31, 2020Updated 5 years ago
- IPAdic packaged for easy use from Python.☆24Oct 31, 2021Updated 4 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- ☆21Feb 19, 2019Updated 7 years ago
- A script for collecting the PubMed Central dataset in a language modelling friendly format.☆25Feb 16, 2021Updated 5 years ago
- Rust port of TLSH☆14Oct 12, 2025Updated 5 months ago
- Romanian Word Embeddings. Here you can find pre-trained corpora of word embeddings. Current methods: CBOW, Skip-Gram, Fast-Text (from Gen…☆13Oct 6, 2025Updated 5 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆27Feb 11, 2025Updated last year
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18May 27, 2021Updated 4 years ago
- ☆17Apr 15, 2023Updated 2 years ago
- analysis of public NLP corpora☆11Feb 9, 2023Updated 3 years ago