rajbot / autocropLinks
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 12 years ago
Alternatives and similar repositories for autocrop
Users that are interested in autocrop are comparing it to the libraries listed below
Sorting:
- A DSL to build Lucene text queries in Python.☆38Updated 8 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Updated 10 years ago
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Updated 13 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Language checker and hyphenator extension for LibreOffice☆12Updated 5 years ago
- A MediaWiki-to-HTML parser for Python.☆54Updated 5 years ago
- Image comparison QA tool for digital preservation workflows.☆14Updated 10 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- Python bindings to the Tesseract API☆66Updated 9 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Utilities for working with data.☆20Updated 10 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago
- Web service for implementing a large-scale translation memory☆90Updated 4 years ago
- Data science tools from Moz☆23Updated 8 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- A skip dict is a Python dictionary which is permanently sorted by value.☆19Updated 10 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- Convert URL's to a normalized unicode format☆67Updated 7 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆205Updated 11 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 9 years ago
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- Image processing and image analysis software. (Mirror of source)☆20Updated 14 years ago
- Backport of Python 3.3's standard library module lzma for LZMA/XY compressed files☆57Updated 3 years ago
- a web based tool to monitor how your website content is used in wikipedia☆37Updated 4 years ago
- Python (Cython) binding for harfbuzz an OpenType text shaping.☆19Updated 7 years ago
- A Python implementation of the Double Metaphone algorithm☆61Updated 14 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- your elastic friend to start supervisord processes based on cpu cores available.☆16Updated 9 years ago
- Simple to use python library for Buffer App☆23Updated 2 years ago