rajbot / autocrop
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 11 years ago
Related projects ⓘ
Alternatives and complementary repositories for autocrop
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 2 years ago
- A simple PDF transcription project for PyBossa☆19Updated 9 years ago
- PIL-compatible interface for platform libraries such as GraphicsMagick, Aware or JAI.☆25Updated 7 years ago
- a Simple API for RDF☆29Updated 15 years ago
- ☆16Updated 8 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- An expandable and scalable OCR pipeline☆86Updated 6 years ago
- Utilities for working with data.☆19Updated 9 years ago
- For code related to making ePub files☆40Updated 8 years ago
- Lightweight, multilingual natural language processing☆63Updated 11 years ago
- Simple to use python library for Buffer App☆23Updated last year
- Python's missing statistical Swiss Army knife☆15Updated 9 years ago
- Experiments mining image collections using OpenCV☆64Updated 9 years ago
- a web based tool to monitor how your website content is used in wikipedia☆37Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 9 years ago
- A small Docker built for the OCRopus OCR system.☆19Updated 6 years ago
- Python bindings to the Tesseract API☆66Updated 8 years ago
- ... just because nltk is too heavy☆36Updated 14 years ago
- experiment in writing a simple data processing toolkit in python☆18Updated last year
- Import GeoNames.org data into a SQLite database for full-text search and autocomplete☆35Updated 5 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- The more often you click a word in the headlines, the more interesting are your news.☆13Updated 7 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 4 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 10 years ago
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Updated 12 years ago
- A MediaWiki-to-HTML parser for Python.☆53Updated 5 years ago
- Serving content from a WARC☆60Updated 11 years ago