jlsutherland / doc2text
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
☆1,273Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for doc2text
- Text page dewarping using a "cubic sheet" model☆1,442Updated last year
- A Python stream processing engine modeled after Yahoo! Pipes☆1,604Updated 2 years ago
- Uses Microsoft Computer Vision API to caption images in an HTML file and fills out its alternative text attributes with the related capti…☆623Updated 7 years ago
- Open source database diagramming and automation tool☆729Updated 5 months ago
- Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab☆2,432Updated 6 years ago
- Neural network OCR.☆1,129Updated 8 years ago
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,574Updated 11 months ago
- Bare bone examples of machine learning in TensorFlow☆2,422Updated 7 years ago
- Image Processing with Cellular Neural Networks in Python☆536Updated 6 years ago
- A fast, straightforward, reliable tool for performing massive, automated code refactoring☆1,634Updated 3 years ago
- An OpenCV based document scanner☆798Updated 8 years ago
- A framework for creating semi-automatic web content extractors☆500Updated 2 weeks ago
- Python-based tools for document analysis and OCR☆3,422Updated 3 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,220Updated 2 years ago
- Python script to do PDF OCR conversion using Tesseract☆373Updated last year
- Make a self hosted OpenVPN server in 15 minutes☆808Updated 7 years ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆930Updated 6 years ago
- [not maintained] find out what's hogging your internet connection.☆1,233Updated 5 years ago
- The magic of Google Autocomplete while you're typing. Anywhere.☆1,540Updated last year
- Minimalist and powerful Web Crawler.☆881Updated 3 years ago
- A library for reading text files over multiple cores.☆1,060Updated last year
- Generates a quiz for a Wikipedia page using parts of speech and text chunking.☆803Updated 4 years ago
- The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploit…☆742Updated 5 years ago
- Xaddress - Give 7 billion people an instant physical address☆1,185Updated 2 years ago
- Bringing the python data stack to the shell prompt☆788Updated 3 years ago
- Automatic Web Article Summarizer☆414Updated 3 years ago
- BrainDump is a simple, powerful, and open note taking platform that makes it easy to organize your life.☆526Updated 7 years ago
- Records and reproduces user's in-page behavior☆726Updated 8 years ago
- extract text from any document. no muss. no fuss.☆3,910Updated this week
- +2600 developer-related blogs and publications.☆636Updated 7 years ago