caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- Tool for sentiment analysis annotation☆12Updated 3 months ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with par…☆10Updated 3 years ago
- Embedding Visualizer (EmbedViz) data app made with Streamlit library☆22Updated 5 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Python tools for Tesseract OCR training☆25Updated 3 years ago
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- simple implementations of different kinds of VAE in tf.keras☆13Updated 5 years ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 4 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- ☆12Updated 10 months ago
- This repo details an algorithm for creating images containing closely packed circles that don't overlap. Generative art, code art, geomet…☆14Updated 4 years ago
- Corpus Build OCR platform☆8Updated 2 years ago
- Matplotlib Image labeller for classifying images☆10Updated 2 months ago
- TPU use in single line in colab using tf2 package.☆11Updated 3 years ago
- A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of ling…☆15Updated 2 years ago
- ☆16Updated last year
- Scraper of ResetEra threads and posts to get them into a format suitable for feeding them into GPT-2.☆15Updated 6 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 2 months ago
- Extract knowledge from raw text☆13Updated 3 years ago
- A text mining tool for developing visual and interactive relationship networks from PubMed article information.☆15Updated 10 months ago
- Finds linguistic patterns effortlessly☆36Updated last year
- Wayward is a Python package that helps to identify characteristic terms from single documents or groups of documents. It can be used for …☆9Updated 5 years ago
- ☆11Updated 6 years ago
- Wrapper around pixel classifier☆9Updated 3 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated this week