rajbot / autocropLinks
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 12 years ago
Alternatives and similar repositories for autocrop
Users that are interested in autocrop are comparing it to the libraries listed below
Sorting:
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Updated 10 years ago
- Language checker and hyphenator extension for LibreOffice☆12Updated 6 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Updated 5 months ago
- A MediaWiki-to-HTML parser for Python.☆54Updated 6 years ago
- A Python implementation of the Double Metaphone algorithm☆61Updated 15 years ago
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 9 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- Image comparison QA tool for digital preservation workflows.☆14Updated 11 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆48Updated 7 years ago
- Django framework for crowdsourcing complex tasks using MTurk☆64Updated 14 years ago
- A simple, quick, powerful web framework☆184Updated 7 years ago
- Implementation of perceptual image hash calculation in Python☆130Updated 2 years ago
- Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language wit…☆48Updated 15 years ago
- Import GeoNames.org data into a SQLite database for full-text search and autocomplete☆35Updated 6 years ago
- A skip dict is a Python dictionary which is permanently sorted by value.☆19Updated 11 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 4 years ago
- A DSL to build Lucene text queries in Python.☆38Updated 9 years ago
- Web service for implementing a large-scale translation memory☆92Updated 4 years ago
- Handwritten optical character recognition☆25Updated 10 years ago
- Python bindings to the Tesseract API☆66Updated 9 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 9 years ago
- Tools for analysing python code☆19Updated 8 years ago
- Lightweight, multilingual natural language processing☆63Updated 12 years ago
- Python library with common functionality for writing web scrapers☆102Updated 10 years ago
- Backend part of Paperwork (Python API, no UI)☆18Updated 7 years ago
- A python framework to generate html and JavaScript from reusable and combine-able widgets.☆24Updated 3 years ago