midorikocak / docsplitter
A small tool to split .docx files by headings.
☆12Updated 4 years ago
Alternatives and similar repositories for docsplitter:
Users that are interested in docsplitter are comparing it to the libraries listed below
- XSLT stylesheets to convert TEI to HTML and ePub format.☆40Updated this week
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 3 weeks ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆113Updated 7 months ago
- The Carnegie Mellon Pronouncing Dictionary (CMUdict).☆15Updated 3 weeks ago
- User contributed (non Google) OCR models for Tesseract☆24Updated 5 months ago
- Building scantailor and its dependencies☆57Updated last year
- GIMP Deskew plugin by Karl Chen☆27Updated 5 months ago
- Translate HTML using Argos Translate☆50Updated last year
- Document Layout Analysis Projects☆23Updated 5 years ago
- « Make your own typeface from your handwriting! ». ⚠ Work in progress. My fork adds multi-page scanned template, french/spanish accents, …☆25Updated 6 years ago
- Minstrel is a FLOSS hybrid reading app specifically designed for Audio-eBooks☆96Updated 8 years ago
- Batch processing helper – GUI – for “ScanTailor-CLI” -- created by Csaba Kovacs☆15Updated 8 years ago
- The source of the phonetic transcriptions is Oxford Advanced Learner's Dictionary (3rd ed.), available from the Oxford Text Archive (http…☆23Updated 7 years ago
- LF Aligner helps translators create translation memories from texts and their translations. It relies on Hunalign for automatic sentence …☆11Updated 9 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated last year
- TMX Editor written in Java and TypeScript☆42Updated 3 months ago
- Readk.it: Digital reading simplified☆83Updated 5 years ago
- Scripts to auto-OCR PDFs, translate output using publicly-available or DIY NLP translation models, and generate epub/PDF☆42Updated 10 months ago
- OCR for DjVu☆48Updated 2 years ago
- A script that converts pdf to markdown style text with headers and bullet points☆11Updated 9 months ago
- Scan Tailor Experimental is an interactive post-processing tool for scanned pages.☆59Updated this week
- Tools to process books in a cloud based pipeline system☆58Updated 2 weeks ago
- PDF minifier that allows removing duplicate data, re-compresses images, creation of PDF/A-1b and digital PDF signing☆55Updated 6 months ago
- Conversions between various OCR formats☆74Updated last year
- A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.☆10Updated 7 years ago
- Gamera 3 for Python 2 (deprecated)☆39Updated 2 years ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆148Updated last year
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last week
- A post-processing tool for scanned sheets of paper.☆80Updated last year