alllexx88 / python-docx-split-run
python-docx run manipulation
☆21Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for python-docx-split-run
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- ☆15Updated 3 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated last week
- Python driver for MobilityDB☆11Updated last year
- Python wrapper for xpdf☆19Updated 4 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- The official implementation of the iConference 2022 paper "Identifying Machine-Paraphrased Plagiarism".☆16Updated 2 years ago
- A Python Package for Visualizing Categorical Data Over Time☆41Updated 5 months ago
- Python based Wikidata framework for easy dataframe extraction☆39Updated 11 months ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- Plot tree based machine learning models☆13Updated last month
- scraping and querying documents for LLMs☆15Updated last week
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- text-data pre-processing utility☆13Updated 2 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- sequence tagging with spaCy and crfsuite☆18Updated last year
- A maximum-strength name parser for record linkage.☆34Updated 3 months ago
- Python utility to extract differences between two pandas dataframes.☆12Updated 4 months ago
- A Python library for creating adversarial splits☆13Updated 2 years ago
- ETL of newspaper article keywords using Apache Airflow, Newspaper3k, Quilt T4 and AWS S3☆15Updated 2 weeks ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆23Updated 6 months ago
- Find duplicate text files.☆11Updated 6 months ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Tool for sentiment analysis annotation☆11Updated last month
- Tool for the Automatic Assessment of Lexical Diversity☆11Updated 3 years ago
- Easy to use pattern matching and information extraction for Python☆38Updated last year
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expre…☆23Updated last year