The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆52Nov 21, 2025Updated 4 months ago
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- A common protocol for AI agent tools☆10Oct 21, 2024Updated last year
- Enhaced version of Wikiextrator: A wikipedia dumps extractor☆28Sep 17, 2025Updated 6 months ago
- Simple and easy stable diffusion inference with LightningModule on GPU, CPU and MPS (Possibly all devices supported by Lightning).☆17Jul 27, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Patch-based inpainting Python library☆12Mar 5, 2023Updated 3 years ago
- ML AI Case Studies and Projects☆10Jun 11, 2020Updated 5 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- decontamination☆29Mar 4, 2026Updated last month
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago
- Website for The State of FOSS in India report.☆10Aug 20, 2021Updated 4 years ago
- The official code of "PixelWorld: Towards Perceiving Everything as Pixels" [TMLR25]☆16Sep 12, 2025Updated 6 months ago
- ☆14Jan 4, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Linear Attention for Efficient Bidirectional Sequence Modeling☆16May 13, 2025Updated 10 months ago
- Ongoing research training transformer language models at scale, including: BERT☆16Apr 25, 2019Updated 6 years ago
- ☆10Jun 8, 2024Updated last year
- docker for HF wav2vec2-sprint☆13Mar 26, 2021Updated 5 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- ☆13Feb 7, 2023Updated 3 years ago
- ☆18Jan 3, 2025Updated last year
- ☆21Dec 5, 2022Updated 3 years ago
- ☆16May 14, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆13Nov 28, 2025Updated 4 months ago
- ☆10Sep 13, 2022Updated 3 years ago
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization☆12Nov 23, 2021Updated 4 years ago
- ☆17May 31, 2023Updated 2 years ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆15Jun 6, 2025Updated 10 months ago
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- ☆16Feb 18, 2023Updated 3 years ago
- Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.☆12May 11, 2021Updated 4 years ago
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆14Jan 31, 2023Updated 3 years ago
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- ☆12Mar 20, 2020Updated 6 years ago
- Collection of description of concepts, procedures, and simple XSLT files for text processing, e.g. simplify InDesign documents (.idml) to…☆12Jan 9, 2020Updated 6 years ago
- ☆11Jun 23, 2022Updated 3 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago