caltechlibrary / documentarist
Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
☆12Updated 3 years ago
Alternatives and similar repositories for documentarist:
Users that are interested in documentarist are comparing it to the libraries listed below
- ☆16Updated 10 months ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Full-featured Algorithmic Intelligence Music Augmentator (AIMA) with full multi-instrument MIDI output and Karaoke support.☆9Updated 4 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated last month
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Visualize large text collections with WebGL☆25Updated 8 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.☆10Updated 7 years ago
- Example of building a working Spanish-to-English translation model with Marian NMT☆22Updated 5 years ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆25Updated 3 years ago
- Unicode Text to IPA Converter☆21Updated 10 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Gentle and praatio scripts for easy forced alignment☆18Updated 2 years ago
- Text classification automl☆21Updated 3 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pa…☆17Updated 9 years ago
- An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy GUI is no longer maintained.☆21Updated last year
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Experiments with Hugging Face 🔬 🤗☆44Updated 8 months ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆28Updated 4 years ago
- Using Conditional Random Fields for segmenting Latin words written in scriptio continua☆10Updated 6 years ago
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- Generate variations of text through synonym matching☆12Updated 7 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago