caltechlibrary / documentarist
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 2 years ago
Alternatives and similar repositories for documentarist:
Users that are interested in documentarist are comparing it to the libraries listed below
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated this week
- Visualize large text collections with WebGL☆25Updated 6 months ago
- ☆16Updated 9 months ago
- Finds linguistic patterns effortlessly☆35Updated last year
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Easy formatted text extraction from images using Google Vision API☆41Updated 3 years ago
- A financial disclosure data extraction tool.☆14Updated last year
- Text classification automl☆21Updated 3 years ago
- An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy GUI is no longer maintained.☆21Updated last year
- Demonstration of gpt-2 model with flask+uwsgi+nginx in web environment containerized in docker for quick deployment.☆13Updated 2 years ago
- Transcribe audio to text with various Speech to Text Tools☆17Updated 4 years ago
- Custom Python functions for working with SQLite FTS4☆22Updated 2 years ago
- Creating a simple recommendation system on the Basis of similarity☆10Updated 6 years ago
- Tools for working with book data☆18Updated this week
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Example of building a working Spanish-to-English translation model with Marian NMT☆21Updated 4 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 3 years ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆19Updated 2 years ago
- Datasets for hackernews posts☆16Updated 3 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 11 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- A PDFMiner wrapper to ease the text extraction from pdf files.☆25Updated 11 years ago
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Updated 7 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago