BayooG / bayoo-docx
Create and modify Word documents with Python
☆143Updated 7 months ago
Alternatives and similar repositories for bayoo-docx:
Users that are interested in bayoo-docx are comparing it to the libraries listed below
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆175Updated this week
- A Python tool to help extracting information from structured PDFs.☆391Updated 3 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated 2 months ago
- Replace words inside a Word document without losing format☆75Updated 8 months ago
- A pure python based utility to extract text and images from docx files.☆526Updated last year
- Demos, examples and utilities using PyMuPDF☆618Updated 6 months ago
- Simplify DOCX files to JSON☆224Updated 4 months ago
- Python bindings to PDFium☆493Updated this week
- Python API for PDF documents☆118Updated 4 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆102Updated 9 months ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆205Updated last year
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆106Updated 3 weeks ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆66Updated 4 years ago
- Pandoc (Python Library)☆146Updated 4 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- 80x faster and 95% accurate language identification with Fasttext☆145Updated last year
- Python interface to Apache PDFBox command-line tools.☆75Updated 2 years ago
- Stripping rtf to plain old text☆98Updated last month
- Python binding to Poppler-cpp pdf library☆105Updated 4 months ago
- A better PDF Extraction Tool using the latest and fastest python features☆22Updated 5 months ago
- Pythonic search engine based on PyLucene.☆124Updated 2 months ago
- TokenQuery (regular expressions over tokens)☆28Updated 7 years ago
- Pure-python library for adding annotations to PDFs☆199Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆44Updated 2 weeks ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆104Updated 4 years ago
- A curated list of awesome data annotation tools☆200Updated 2 years ago
- Pure-Python full-text search library☆595Updated last year
- ☆28Updated 5 months ago
- Fast multi-keyword search engine for text strings☆250Updated 4 months ago