raleighpublicrecord / dochive
Structured Data from PDF image-based files
☆88Updated 11 years ago
Alternatives and similar repositories for dochive:
Users that are interested in dochive are comparing it to the libraries listed below
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- Discover, analyze and present data from the web and mobile in meaninful ways☆83Updated 11 years ago
- A platform for tools that do stuff with data☆56Updated 6 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 2 years ago
- Ideas for (tech) stuff to research, build or work on.☆50Updated last month
- Docker container to provide Apache Tika RESTful API☆40Updated 9 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 9 years ago
- Open Data Index website☆39Updated 6 years ago
- Parser for U.S. federal regulations and other regulatory information☆39Updated last year
- neonion is a user-centered collaborative semantic annotation webapp developed at the Human-Centered Computing group at Freie Universität …☆68Updated 6 years ago
- Tools for generating portable data portals☆58Updated 2 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago
- Create and validate Data Packages in the browser☆27Updated 3 years ago
- CFPB's streaming batch geocoder☆37Updated 8 years ago
- **el.vis** - a tool for visualising public (EU) tenders big data☆8Updated last year
- A space for code and projects around analysing news content☆23Updated 7 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Chambua is an open-source semantic tagging application that analyses text and extracts names of people, places (& geocodes them), organis…☆33Updated 3 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Easily crowdsource the analysis of your documents☆102Updated 7 years ago
- A library for extracting tables from PDF files☆90Updated 11 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- Examples of corrupt CSV files and how they trick various parsers☆10Updated 8 years ago
- Just like on ScraperWiki Classic; now a part of QuickCode.☆38Updated 8 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- paginate-for-print is trying to recreate some of the basic features of pagination.js without using CSS Regions with a focus on Chrome, Fi…☆25Updated 4 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 4 years ago