caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆17Updated last week
- ☆12Updated last year
- Post-processing OCR errors with seq2seq models☆28Updated 5 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 4 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Apply different text recognition services to images of handwritten documents.☆184Updated 2 years ago
- ☆20Updated 4 years ago
- Using Conditional Random Fields for segmenting Latin words written in scriptio continua☆10Updated 7 years ago
- Ergonomic line-by-line transcription of scanned text.☆53Updated 4 years ago
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆28Updated 4 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy GUI is no longer maintained.☆21Updated last year
- Visualize large text collections with WebGL☆26Updated last year
- Finds linguistic patterns effortlessly☆38Updated 2 years ago
- Python tools for Tesseract OCR training☆25Updated 3 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- 🦁 Nala is an agile open-source voice assistant framework (20+ actions).☆35Updated 2 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Updated 8 months ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 3 years ago
- Deep Neural Networks for audio classification☆11Updated last year
- Deploy DL/ ML inference pipelines with minimal extra code.☆99Updated 10 months ago
- simple implementations of different kinds of VAE in tf.keras☆13Updated 5 years ago
- Document Search Engine Tool☆74Updated 2 years ago
- Using Gradio interface to build UI for converting text to speech☆13Updated 4 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- Text classification automl☆21Updated 4 years ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago