caltechlibrary / documentaristLinks
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist
Users that are interested in documentarist are comparing it to the libraries listed below
Sorting:
- Post-processing OCR errors with seq2seq models☆28Updated 5 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Using Conditional Random Fields for segmenting Latin words written in scriptio continua☆10Updated 7 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated last week
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆28Updated 4 years ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 3 years ago
- Visualize large text collections with WebGL☆26Updated 11 months ago
- simple implementations of different kinds of VAE in tf.keras☆13Updated 5 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- Transcribes and summarizes speech or audio☆37Updated 3 years ago
- Text classification automl☆21Updated 4 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 5 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆100Updated 8 months ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Deeplearing based Reverse Image Search using Annoy library☆16Updated 6 years ago
- Deep Neural Networks for audio classification☆11Updated last year
- Experiments with Hugging Face 🔬 🤗☆44Updated 11 months ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- A workflow system for Natural Language Processing.☆22Updated 5 years ago
- Visual Clustering: Clustering Plotted Data by Image Segmentation☆25Updated 5 months ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated 2 years ago
- Python tools for Tesseract OCR training☆25Updated 3 years ago
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- Finds linguistic patterns effortlessly☆37Updated last year
- Take any phone-taken picture and turn it into a document scan.☆91Updated 11 months ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Updated 6 months ago
- ☆20Updated 4 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- Extract information from XBRL files in the ESEF format☆12Updated this week