raleighpublicrecord / dochive
Structured Data from PDF image-based files
☆87Updated 11 years ago
Alternatives and similar repositories for dochive:
Users that are interested in dochive are comparing it to the libraries listed below
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 2 years ago
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 2 years ago
- A platform for tools that do stuff with data☆56Updated 5 years ago
- A small Docker built for the OCRopus OCR system.☆19Updated 7 years ago
- Discover, analyze and present data from the web and mobile in meaninful ways☆83Updated 11 years ago
- Tools for exploring the contents of web archive files.☆39Updated 4 years ago
- (DEPRECATED) Parser for U.S. federal regulations and other regulatory information☆54Updated 6 years ago
- Trough: Big data, small databases.☆40Updated 5 months ago
- Detective.io is a platform that hosts your investigation and lets you make powerful queries to mine it. Simply describe your field of stu…☆138Updated 9 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 4 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- A fast, responsive HTML5 viewer for scanned items, developed for the World Digital Library. A project of the Library of Congress. Note: p…☆22Updated 9 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- Data Quality Dashboards display statistics on a collection of published data.☆33Updated 4 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 9 years ago
- Open Data Index website☆38Updated 6 years ago
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- A Relaxed Schema Graph Database Management System☆52Updated 4 years ago
- The news homepage archive☆81Updated 3 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Create and validate Data Packages in the browser☆27Updated 3 years ago
- View, visualize, clean and process data in the browser.☆148Updated 6 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆62Updated 8 months ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- An online annotation platform for teaching and learning in the humanities.☆107Updated 2 months ago
- REST endpoint for Tabula☆25Updated 5 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago