caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 7 years ago
- DFKI Layout Detection for OCR-D☆47Updated 7 months ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 7 months ago
- ☆15Updated last year
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- ☆12Updated last year
- Visualize large text collections with WebGL☆26Updated last year
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- Transcribes and summarizes speech or audio☆36Updated 4 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆17Updated this week
- cologne-phonetics implementation in python☆17Updated last year
- Deploy DL/ ML inference pipelines with minimal extra code.☆102Updated last year
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 4 years ago
- A public repository of work for the Speech Verification component of the undergrad squad for Doubtfire.☆13Updated 4 years ago
- Collection of tools to extract features from film material.☆40Updated 7 years ago
- Tools for using OpenAI Codex to do various useful things☆48Updated 4 years ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- Creating a simple recommendation system on the Basis of similarity☆11Updated 7 years ago
- Text classification automl☆21Updated 4 years ago
- A text generation Transformer model trained on Reddit posts.☆16Updated 2 years ago
- 🦁 Nala is an agile open-source voice assistant framework (20+ actions).☆35Updated 2 years ago
- Python wrapper for xpdf☆19Updated 6 years ago
- Experiments with Hugging Face 🔬 🤗☆44Updated last year
- Python tools for Tesseract OCR training☆26Updated 3 years ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Updated 2 years ago
- Building and Using A Seed Corpus for the Human Language Project☆11Updated 7 years ago
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Updated last year
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 4 years ago