raleighpublicrecord / dochive
Structured Data from PDF image-based files
☆88Updated 12 years ago
Alternatives and similar repositories for dochive:
Users that are interested in dochive are comparing it to the libraries listed below
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 2 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 10 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated last month
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- Data storytelling. See link for detailed documentations: http://lab41.github.io/gestalt.☆20Updated 8 years ago
- ☆24Updated 9 years ago
- Ideas for (tech) stuff to research, build or work on.☆50Updated 4 months ago
- Tools for generating portable data portals☆58Updated 2 years ago
- A platform for tools that do stuff with data☆56Updated 6 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 10 years ago
- neonion is a user-centered collaborative semantic annotation webapp developed at the Human-Centered Computing group at Freie Universität …☆68Updated 6 years ago
- Schemas and helpful handlers for OADA-related formats.☆16Updated 4 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Create and validate Data Packages in the browser☆27Updated 3 years ago
- Easily crowdsource the analysis of your documents☆102Updated 7 years ago
- [DEPRECATED] Please use https://github.com/frictionlessdata/specs☆17Updated 7 years ago
- JavaScript app for displaying annotated network graphs from the LittleSis API and other data sources☆39Updated 7 years ago
- (DEPRECATED) Parser for U.S. federal regulations and other regulatory information☆55Updated 6 years ago
- Trough: Big data, small databases.☆41Updated 9 months ago
- View, visualize, clean and process data in the browser.☆148Updated 6 years ago
- Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.☆84Updated 9 years ago
- Moved to:☆58Updated 5 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 8 years ago
- CoVE is an web application to Convert, Validate and Explore data following certain open data standards - including 360Giving, Open Contra…☆44Updated 3 weeks ago
- XML Director - XML Content Management☆16Updated last year
- Generate westminster parliament charts as virtual-dom SVG.☆12Updated 3 years ago