institutional/institutional-books-1-pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/institutional/institutional-books-1-pipeline)

institutional / institutional-books-1-pipeline

The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.

☆54

Alternatives and similar repositories for institutional-books-1-pipeline

Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Pleias / OCRoscope
View on GitHub
Small python package to measure OCR quality and other related metrics.
☆26Feb 19, 2024Updated 2 years ago
DEFI-COLaF / LADaS
View on GitHub
Layout Analysis Dataset with Segmonto (LADaS)
☆25May 29, 2026Updated 2 months ago
Post45-Data-Collective / BookReconciler
View on GitHub
BookReconciler, A Tool for Metadata Enrichment and Clustering of Book Data
☆40Mar 2, 2026Updated 4 months ago
NationalLibraryOfNorway / warchaeology
View on GitHub
Command line tool for digging into WARC files
☆50Updated this week
harvard-lil / warcbench
View on GitHub
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
☆22Jul 30, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MaLA-LM / GlotEval
View on GitHub
GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way
☆18Nov 4, 2025Updated 8 months ago
internetarchive / archive-hocr-tools
View on GitHub
Efficient hOCR tooling
☆57Aug 18, 2025Updated 11 months ago
maxdotio / mighty-batch
View on GitHub
Highly concurrent and fast content processing for Mighty Inference Server
☆10Feb 6, 2023Updated 3 years ago
miku / span
View on GitHub
Span formats.
☆16Jul 22, 2026Updated last week
qyhou / curated-document-layout-analysis
View on GitHub
A curated list of resources on Document Layout Analysis
☆12Aug 7, 2025Updated 11 months ago
ljos / navnkjenner
View on GitHub
Named-Entity Recognition for Norwegian Bokmål and Nynorsk
☆12Aug 5, 2019Updated 6 years ago
UKPLab / AdaSent
View on GitHub
This repository contains the code for the EMNLP'23 paper "AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classificati…
☆16Jun 3, 2024Updated 2 years ago
JonnoB / enhance_ocod
View on GitHub
A library for working with the OCOD dataset for analysis of property in England and Wales owned by offshore companies
☆14May 13, 2026Updated 2 months ago
jespino / pgpageshell
View on GitHub
An interactive shell for inspecting PostgreSQL data files at the page level. Open any heap or index file and navigate through its 8KB pag…
☆16Mar 6, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
DIPSAS / DockerBuildManagement
View on GitHub
Build Management is a python application, installed with pip. The application makes it easy to manage a build system based on Docker by c…
☆14Sep 22, 2021Updated 4 years ago
miku / parallel
View on GitHub
Process lines in parallel.
☆22Jan 23, 2025Updated last year
slub / lod-explorativ
View on GitHub
lod-explorativ is a prototype of a Svelte webapp which let you explore bibliographic resources from a topic's point of view.
☆15Jan 19, 2022Updated 4 years ago
inertia-lab / bookdata-tools
View on GitHub
Tools for working with book data
☆20Nov 25, 2025Updated 8 months ago
frictionlessdata / DataPackage.jl
View on GitHub
A Julia library for working with Data Package.
☆11Aug 10, 2021Updated 4 years ago
huridocs / pdf-reading-order
View on GitHub
☆16Apr 26, 2024Updated 2 years ago
dbmdz / wolpi
View on GitHub
Wolpi: A fast and extensible IIIF Image Server
☆17Updated this week
agentsea / toolfuse
View on GitHub
A common protocol for AI agent tools
☆10Oct 21, 2024Updated last year
drkane / datasette-reconcile
View on GitHub
Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.
☆24Feb 2, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
aishe-ai / core
View on GitHub
LLM Assistent with Chat Integration
☆14Sep 5, 2024Updated last year
UB-Mannheim / ocromore
View on GitHub
Process, enhance and evaluate multiple OCR output.
☆24Dec 2, 2025Updated 7 months ago
dbmdz / historic-ner
View on GitHub
Repository for "Towards Robust Named Entity Recognition for Historic German"
☆18Dec 11, 2020Updated 5 years ago
edchengg / easyproject
View on GitHub
ACL 2023 (Findings) End-to-end Cross-lingual Label Project
☆15Nov 24, 2023Updated 2 years ago
Wickstrom / RELAX
View on GitHub
Code for RELAX, a framework for explaining representations.
☆12Jan 7, 2024Updated 2 years ago
o19s / pdf-discovery-demo
View on GitHub
Demonstration of searching PDF document with Solr, Tika, and Tesseract
☆32Oct 18, 2024Updated last year
recogito / recogito-studio
View on GitHub
Self hosting code for Recogito-Studio
☆23Jul 6, 2026Updated 3 weeks ago
ieg-dhr / NLP-Course4Humanities_2024
View on GitHub
This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…
☆21Jun 5, 2025Updated last year
aks2203 / easy-to-hard-data
View on GitHub
Pytorch Datasets for Easy-To-Hard
☆30Jan 9, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ltgoslo / NorQuAD
View on GitHub
Norwegian question answering dataset
☆15Feb 3, 2024Updated 2 years ago
VITA-Group / TAPE
View on GitHub
[ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…
☆15Jun 6, 2025Updated last year
zxcalc / sample-projects
View on GitHub
Sample projects for Quantomatic
☆12Apr 25, 2020Updated 6 years ago
pnnl / ML4AlgComb
View on GitHub
ML Benchmarks in Algebraic Combinatorics
☆25Jan 15, 2026Updated 6 months ago
kb-labb / easyaligner
View on GitHub
Forced alignment made easy
☆22Jul 4, 2026Updated 3 weeks ago
iscc / iscc-sdk
View on GitHub
ISCC - Software Development Kit
☆22Updated this week
bltlab / mot
View on GitHub
Multilingual Open Text
☆26May 8, 2025Updated last year