caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- Instagram-like filters with deep learning☆56Updated last year
- Visualize large text collections with WebGL☆26Updated 11 months ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 4 months ago
- ☆12Updated last year
- A system for reading scanned documents and grouping them into high level topics☆14Updated 5 years ago
- Tools for using OpenAI Codex to do various useful things☆48Updated 4 years ago
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Updated 6 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆99Updated 9 months ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- Guess the Hacker News titles☆12Updated 3 years ago
- Visual search interface☆11Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆53Updated 4 years ago
- Apply different text recognition services to images of handwritten documents.☆183Updated 2 years ago
- Experiments with Hugging Face 🔬 🤗☆44Updated last year
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆28Updated 4 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 6 years ago
- Finds linguistic patterns effortlessly☆38Updated 2 years ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 3 years ago
- Visual Clustering: Clustering Plotted Data by Image Segmentation☆25Updated 6 months ago
- a graph definition and execution library for python☆16Updated 2 years ago
- Text classification automl☆21Updated 4 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆54Updated 2 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 4 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- This repository hosts code for converting the original MLP Mixer models (JAX) to TensorFlow.☆15Updated 3 years ago