scantailor / ScanTailor-CLI-GUILinks
Batch processing helper – GUI – for “ScanTailor-CLI” -- created by Csaba Kovacs
☆15Updated 9 years ago
Alternatives and similar repositories for ScanTailor-CLI-GUI
Users that are interested in ScanTailor-CLI-GUI are comparing it to the libraries listed below
Sorting:
- Building scantailor and its dependencies☆64Updated 2 years ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆126Updated 2 months ago
- Automatic de-keystoning for single camera DIY book scanners.☆50Updated 5 years ago
- smoothscan is a tool to convert scanned text into a vectorized output form.☆67Updated 12 years ago
- A free Windows graphical interface to the Tesseract 4.0 OCR engine.☆61Updated 3 years ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆35Updated 9 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆66Updated last year
- PDF to DjVu converter☆98Updated last year
- ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST☆224Updated last week
- OCR for DjVu☆47Updated 3 years ago
- Automatic de-keystoning for single camera DIY book scanners☆24Updated 9 years ago
- Conversions between various OCR formats☆81Updated 2 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆196Updated 6 months ago
- CDXJ Indexing of WARC/ARCs☆30Updated 11 months ago
- Efficient hOCR tooling☆52Updated 3 months ago
- Tools to process books in a cloud based pipeline system☆64Updated 7 months ago
- search interface for scholarly works☆85Updated last year
- Scripts to auto-OCR PDFs, translate output using publicly-available or DIY NLP translation models, and generate epub/PDF☆44Updated last year
- A list of things related to software, literature, and other content for 🕣 Memento☆102Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆52Updated this week
- Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is des…☆158Updated 8 months ago
- Documentation and use cases for ALTO XML☆41Updated 7 years ago
- Crop And Splice Segments (of scanned pages)☆14Updated 6 years ago
- Docker setup for OCR4all bundled with Larex☆22Updated last year
- A post-processing tool for scanned sheets of paper.☆85Updated last year
- Like macOS `open` but for Windows☆13Updated 4 years ago
- Cross-platform library client to automate any OPAC and library catalog from your local device, e.g. for renewing of borrowed books or se…☆44Updated 3 weeks ago
- The hOCR Embedded OCR Workflow and Output Format☆75Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆55Updated 2 months ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 7 months ago