jlsutherland/doc2text

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jlsutherland/doc2text)

jlsutherland / doc2text

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

☆1,279

Alternatives and similar repositories for doc2text

Users that are interested in doc2text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,255Jun 24, 2022Updated 4 years ago
david-gpu / srez
View on GitHub
Image super-resolution through deep learning
☆5,272Aug 16, 2017Updated 8 years ago
vipul-sharma20 / document-scanner
View on GitHub
An OpenCV based document scanner
☆827Aug 20, 2016Updated 9 years ago
ankitaggarwal011 / PyCNN
View on GitHub
Image Processing with Cellular Neural Networks in Python
☆544Nov 1, 2018Updated 7 years ago
ocropus-archive / DUP-ocropy
View on GitHub
Python-based tools for document analysis and OCR
☆3,466May 22, 2021Updated 5 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
mzucker / noteshrink
View on GitHub
Convert scans of handwritten notes to beautiful, compact PDFs
☆4,841Mar 20, 2024Updated 2 years ago
nerevu / riko
View on GitHub
A Python stream processing engine modeled after Yahoo! Pipes
☆1,601Updated this week
openpaperwork / paperwork
View on GitHub
Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
☆2,434Mar 26, 2026Updated 3 months ago
ajbrock / Neural-Photo-Editor
View on GitHub
A simple interface for editing natural photos with generative neural networks.
☆2,074Mar 22, 2017Updated 9 years ago
YelpArchive / undebt
View on GitHub
A fast, straightforward, reliable tool for performing massive, automated code refactoring
☆1,624Apr 5, 2021Updated 5 years ago
pcbje / gransk
View on GitHub
Document processing for investigations
☆251Jan 7, 2017Updated 9 years ago
paarthneekhara / text-to-image
View on GitHub
Text to image synthesis using thought vectors
☆2,161Jan 30, 2018Updated 8 years ago
alex-sherman / deco
View on GitHub
☆1,567Nov 3, 2021Updated 4 years ago
nvdv / vprof
View on GitHub
Visual profiler for Python
☆3,980Jul 15, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dschep / ntfy
View on GitHub
🖥️📱🔔 A utility for sending notifications, on demand and when commands finish.
☆4,964Oct 27, 2025Updated 8 months ago
mzucker / page_dewarp
View on GitHub
Text page dewarping using a "cubic sheet" model
☆1,520Mar 2, 2023Updated 3 years ago
christabor / flask_jsondash
View on GitHub
Build complex dashboards without any front-end code. Use your own endpoints. JSON config only. Ready to go.
☆3,283Jun 26, 2026Updated 3 weeks ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,218Jun 24, 2026Updated 3 weeks ago
pystitch / stitch
View on GitHub
Write reproducible reports in Markdown
☆440Dec 21, 2018Updated 7 years ago
the-paperless-project / paperless
View on GitHub
Scan, index, and archive all of your paper documents
☆7,917Apr 6, 2021Updated 5 years ago
gregdurrett / berkeley-doc-summarizer
View on GitHub
The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploit…
☆745Feb 25, 2019Updated 7 years ago
jjangsangy / ExplainToMe
View on GitHub
Automatic Web Article Summarizer
☆417Sep 8, 2021Updated 4 years ago
attic-labs / noms
View on GitHub
The versioned, forkable, syncable database
☆7,425Aug 27, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HiFaraz / node-playbook
View on GitHub
Get started fast with Node.js
☆1,384Sep 2, 2018Updated 7 years ago
fgadaleta / deeplearning-ahem-detector
View on GitHub
☆438Jul 20, 2018Updated 8 years ago
cisocrgroup / Resources
View on GitHub
Manuals, lexica, OCR test data for PoCoTo and the profiler
☆15Jul 2, 2021Updated 5 years ago
jsvine / waybackpack
View on GitHub
Download the entire Wayback Machine archive for a given URL.
☆3,217Apr 21, 2025Updated last year
pseudo-lang / pseudo
View on GitHub
transpile algorithms/libs to idiomatic JS, Go, C#, Ruby
☆687Mar 25, 2021Updated 5 years ago
Kinto / kinto
View on GitHub
A generic JSON document store with sharing and synchronisation capabilities.
☆4,419Updated this week
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,551Mar 22, 2024Updated 2 years ago
pavelgonchar / colornet
View on GitHub
Neural Network to colorize grayscale images
☆3,555Apr 21, 2020Updated 6 years ago
zhoubear / open-paperless
View on GitHub
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
☆2,559Dec 10, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dsys / match
View on GitHub
Scalable reverse image search built on Kubernetes and Elasticsearch
☆1,264Jul 25, 2020Updated 5 years ago
roberdam / Xaddress
View on GitHub
Xaddress - Give 7 billion people an instant physical address
☆1,182Sep 30, 2022Updated 3 years ago
metachris / pdfx
View on GitHub
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,076Jun 15, 2023Updated 3 years ago
hugapi / hug
View on GitHub
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
☆6,884Jul 4, 2024Updated 2 years ago
kootenpv / whereami
View on GitHub
Uses WiFi signals and machine learning to predict where you are
☆5,139Nov 30, 2023Updated 2 years ago
tensorflow / skflow
View on GitHub
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
☆3,168Aug 30, 2021Updated 4 years ago
sirfz / tesserocr
View on GitHub
A Python wrapper for the tesseract-ocr API
☆2,165Mar 16, 2026Updated 4 months ago