rajbot / autocropLinks
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 12 years ago
Alternatives and similar repositories for autocrop
Users that are interested in autocrop are comparing it to the libraries listed below
Sorting:
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- Python bindings to the Tesseract API☆66Updated 9 years ago
- A DSL to build Lucene text queries in Python.☆38Updated 9 years ago
- OCR for DjVu☆47Updated 3 years ago
- A Python implementation of the Double Metaphone algorithm☆61Updated 15 years ago
- Find which links on a web page are pagination links☆29Updated 9 years ago
- A skip dict is a Python dictionary which is permanently sorted by value.☆19Updated 11 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 4 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Updated 10 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- Data science tools from Moz☆23Updated 9 years ago
- Language checker and hyphenator extension for LibreOffice☆12Updated 5 years ago
- A MediaWiki-to-HTML parser for Python.☆54Updated 6 years ago
- An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- A simple PDF transcription project for PyBossa☆19Updated 10 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago
- Convert cron emails to RSS 2.0. It's the least you can do.☆14Updated 10 years ago
- Python library with common functionality for writing web scrapers☆102Updated 10 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- Speech recognition in Python made easy and flexible☆11Updated 10 years ago
- ... just because nltk is too heavy☆35Updated 15 years ago
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)☆29Updated 14 years ago
- Implementation of perceptual image hash calculation in Python☆133Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago
- Lightweight, multilingual natural language processing☆63Updated 12 years ago
- Where's my URL (dammit) tracking the (HTTP) status of URLs☆38Updated 16 years ago
- ☆19Updated 9 years ago
- Tools to manipulate and extract data from wikipedia dumps☆47Updated 12 years ago