Convert a PDF via OCR to a TXT file in UTF-8 encoding
☆159Oct 3, 2023Updated 2 years ago
Alternatives and similar repositories for ocr2text
Users that are interested in ocr2text are comparing it to the libraries listed below
Sorting:
- Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI☆42Apr 3, 2024Updated last year
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303May 25, 2025Updated 9 months ago
- Python library to extract tabular data from images and scanned PDFs☆284Jul 30, 2024Updated last year
- Interlinear glossing with JS & CSS☆20Aug 23, 2015Updated 10 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆69Nov 7, 2020Updated 5 years ago
- ☆10Mar 16, 2023Updated 3 years ago
- An expandable and scalable OCR pipeline☆90Nov 14, 2017Updated 8 years ago
- jpdfbookmarks - fix JPdfBookmarks GUI mode open a pdf have bookmarks include CJK (Chinese , Japanese , Korean ) characters will show like…☆11Sep 4, 2023Updated 2 years ago
- OCR-D python tools☆33Aug 16, 2024Updated last year
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Mar 31, 2025Updated 11 months ago
- Next generation OCR engine based on LSTMs.☆51Apr 8, 2018Updated 7 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆67Jan 6, 2024Updated 2 years ago
- Missing addon manager for firefox☆17Aug 3, 2023Updated 2 years ago
- HOCR Specification Python Parser☆12Sep 23, 2015Updated 10 years ago
- EDSL code☆19Mar 19, 2022Updated 4 years ago
- Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.☆523Mar 3, 2021Updated 5 years ago
- ☆12Mar 22, 2018Updated 7 years ago
- 文庫本スタイルのゲラをテキストファイルから作る、github actionsのワークフローです。☆11Sep 29, 2021Updated 4 years ago
- Optical character recognition using neural network. Implemented with Python and its libraries Numpy and OpenCV.☆36Oct 1, 2016Updated 9 years ago
- Generate periodic oscillation into an array/audiobuffer☆27May 25, 2020Updated 5 years ago
- Text summarization with python and transformer☆13Jun 17, 2023Updated 2 years ago
- An evaluation of word-embeddings for classification☆32Feb 19, 2019Updated 7 years ago
- An Alfred GUI for Pandoc☆35Aug 28, 2018Updated 7 years ago
- Building API and tools for EPO OPS patent data☆10Mar 16, 2017Updated 9 years ago
- Multiwriter documents over dat☆13May 11, 2020Updated 5 years ago
- SuperCollider bindings to Fomus Music Notation☆26Nov 23, 2022Updated 3 years ago
- Repository of open knowledge about web scraping in Python☆13May 30, 2022Updated 3 years ago
- A basic demo showing how you can make a gossip based p2p chat using hyperswarm.☆29Jan 20, 2020Updated 6 years ago
- Garlmap is the Gapless Almighty Rule-based Logical Mpv Audio Player☆15Feb 27, 2026Updated 3 weeks ago
- Grepify the GUI Regex Text Scanner for Code Reviewers☆23Apr 15, 2013Updated 12 years ago
- One Big Text File (OBTF) Journal in Markdown☆17Jan 17, 2026Updated 2 months ago
- Document Layout Analysis☆401Mar 13, 2026Updated last week
- This is the CoCalc Electron desktop application.☆18Sep 30, 2022Updated 3 years ago
- ☆14Nov 24, 2022Updated 3 years ago
- Flask website integrated with Tesseract-OCR for reading multiple images, extracting text from them, and saving to Word, PDF, or txt file …☆16Jul 10, 2022Updated 3 years ago
- a repository containing the code for the paper 'Real time interactions in oTree using Django Channels: auctions and real effort tasks'☆10Apr 18, 2020Updated 5 years ago
- ☆15Sep 5, 2025Updated 6 months ago
- ☆12Aug 30, 2018Updated 7 years ago
- OCR engine for all the languages☆964Mar 10, 2026Updated last week