caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- Finds linguistic patterns effortlessly☆36Updated last year
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- A PDFMiner wrapper to ease the text extraction from pdf files.☆25Updated 12 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated this week
- A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with par…☆10Updated 2 years ago
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Extract knowledge from raw text☆13Updated 3 years ago
- ☆16Updated 11 months ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- Text classification automl☆21Updated 3 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆34Updated 2 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy GUI is no longer maintained.☆21Updated last year
- Example of building a working Spanish-to-English translation model with Marian NMT☆22Updated 5 years ago
- Gentle and praatio scripts for easy forced alignment☆18Updated 2 years ago
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- Wrapper around pixel classifier☆9Updated 3 years ago
- Tools for evaluating OCR performance relative to ground truth.☆10Updated last year
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- Python package for converting xml and epubs to text files☆34Updated 4 years ago
- 🚀GUI for training spaCy models☆55Updated 4 years ago
- Deeplearing based Reverse Image Search using Annoy library☆16Updated 6 years ago
- Scripts for building a geo-located web corpus using Common Crawl data☆11Updated last month
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- Loan Risk Prediction Neural Network and API☆17Updated 4 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 2 months ago