alllexx88 / python-docx-split-run
python-docx run manipulation
☆21Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for python-docx-split-run
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated last week
- ☆15Updated 3 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 2 months ago
- Python wrapper for xpdf☆19Updated 4 years ago
- Python based Wikidata framework for easy dataframe extraction☆39Updated 11 months ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated 7 months ago
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision m…☆62Updated this week
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 7 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆43Updated 3 months ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 5 years ago
- 🌸 Train floret vectors☆18Updated last year
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 2 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated 6 months ago
- Python and data science snippets on the command line☆21Updated 3 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- ☆13Updated 5 years ago
- Statistical visualizations for Datasette using Seaborn☆11Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12Updated 2 years ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 3 years ago
- Visualize large text collections with WebGL☆25Updated 2 months ago
- Transforming textual descriptions into process models using deep learning☆12Updated 5 years ago
- Tools for interactive visual exploration of semantic embeddings.☆28Updated 2 months ago