rajbot / autocropLinks
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 12 years ago
Alternatives and similar repositories for autocrop
Users that are interested in autocrop are comparing it to the libraries listed below
Sorting:
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- A MediaWiki-to-HTML parser for Python.☆54Updated 6 years ago
- Language checker and hyphenator extension for LibreOffice☆12Updated 5 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Updated 10 years ago
- Image comparison QA tool for digital preservation workflows.☆14Updated 11 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 4 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 9 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- Lightweight, multilingual natural language processing☆63Updated 12 years ago
- A DSL to build Lucene text queries in Python.☆38Updated 8 years ago
- natural language processing with link-grammar☆18Updated 16 years ago
- Bash-style pipelining for Python generators.☆17Updated 14 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Updated 4 months ago
- A git based cms for django☆70Updated 4 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 10 years ago
- Convert URL's to a normalized unicode format☆67Updated 7 years ago
- Data science tools from Moz☆23Updated 8 years ago
- DEPRECATED - Code for source.mozillaopennews.org/☆37Updated 6 years ago
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Updated 13 years ago
- Django framework for crowdsourcing complex tasks using MTurk☆64Updated 14 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆206Updated 12 years ago
- A Python implementation of the Double Metaphone algorithm☆61Updated 15 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- A storage layer for numeric data that changes over time☆333Updated 9 years ago
- Interactive Programming Notebook for the Web Browser☆98Updated 5 years ago
- Aelius is a suite of Python, NLTK-based modules and language data for training and evaluating POS-taggers for Brazilian Portuguese and an…☆19Updated 14 years ago
- A library for extracting tables from PDF files☆89Updated 12 years ago
- OCR for DjVu☆47Updated 3 years ago