rajbot / autocropLinks
This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
☆28Updated 12 years ago
Alternatives and similar repositories for autocrop
Users that are interested in autocrop are comparing it to the libraries listed below
Sorting:
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- Speech recognition in Python made easy and flexible☆11Updated 10 years ago
- Image comparison QA tool for digital preservation workflows.☆14Updated 11 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago
- Python bindings to the Tesseract API☆66Updated 9 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Updated 10 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 9 years ago
- Lightweight, multilingual natural language processing☆63Updated 12 years ago
- A DSL to build Lucene text queries in Python.☆38Updated 8 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- A MediaWiki-to-HTML parser for Python.☆54Updated 6 years ago
- Language checker and hyphenator extension for LibreOffice☆12Updated 5 years ago
- Handwritten optical character recognition☆25Updated 10 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆63Updated 3 months ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 9 years ago
- Vidscraper is a python library which provides a simple API for fetching video data from various web services and sites.☆63Updated 3 years ago
- OCR for DjVu☆47Updated 3 years ago
- Aelius is a suite of Python, NLTK-based modules and language data for training and evaluating POS-taggers for Brazilian Portuguese and an…☆19Updated 13 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- ECTOR is a learning chatterbot. pyECTOR is its python version.☆13Updated 7 years ago
- Data science tools from Moz☆23Updated 8 years ago
- Convert URL's to a normalized unicode format☆67Updated 7 years ago
- Source code of demo app for image comparison☆74Updated 10 years ago
- Python library implementing the ISO/IEC 26300 OpenDocument Format standard (ODF)☆54Updated 5 years ago
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Updated 13 years ago
- Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language wit…☆48Updated 15 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 9 years ago