caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- Transcribes and summarizes speech or audio☆37Updated 3 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 2 months ago
- Experiments with Hugging Face 🔬 🤗☆44Updated 10 months ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated this week
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated 2 years ago
- Embedding Visualizer (EmbedViz) data app made with Streamlit library☆23Updated 5 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Python tools for Tesseract OCR training☆25Updated 3 years ago
- Text classification automl☆21Updated 4 years ago
- App to explore latent spaces of music collections☆34Updated last year
- Visualize large text collections with WebGL☆26Updated 10 months ago
- DFKI Layout Detection for OCR-D☆47Updated 2 months ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Finds linguistic patterns effortlessly☆37Updated last year
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆53Updated 2 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrain☆11Updated last year
- A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with par…☆10Updated 3 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Updated 5 months ago
- Collection of tools to extract features from film material.☆39Updated 7 years ago
- Another implementation of the paper "Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs" in…☆13Updated 4 years ago
- I have customized the code of Adrian to find 4 points of document or rectangle dynamically. Here i have added findLargestCountours and co…☆38Updated 7 years ago
- A system for reading scanned documents and grouping them into high level topics☆15Updated 4 years ago
- Apply different text recognition services to images of handwritten documents.☆183Updated 2 years ago
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- Using Machine Learning to Create High-Res Fine Art☆13Updated last year