pjlsergeant / ziprip
Extract postal addresses from the DOM
☆66Updated 12 years ago
Alternatives and similar repositories for ziprip:
Users that are interested in ziprip are comparing it to the libraries listed below
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 7 years ago
- Read natural language interactive queries. Great for bots.☆18Updated 8 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆166Updated 2 years ago
- email dataset for email signature parsing☆55Updated 8 years ago
- A Python canonicalizer to disambiguate and recognize known names from a poor quality data entry list.☆20Updated 8 years ago
- A node.js library for extracting data from scanned forms.☆117Updated 2 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 9 years ago
- ☆13Updated 7 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 7 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- Open Source implementation of Summly☆47Updated 8 years ago
- ☆21Updated 6 years ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆182Updated 6 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 7 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.☆119Updated 2 years ago
- mltk - Moz Language Tool Kit☆12Updated 9 years ago
- Dedupe/batch geocode addresses and venues around the world with libpostal☆81Updated 3 years ago
- ☆24Updated 9 years ago
- Index Common Crawl archives in tabular format☆110Updated 2 months ago
- Exploring extracting tables from a PDF to CSV using PDF.JS☆103Updated 8 years ago
- Mechanical Turk on your own machine.☆205Updated 2 months ago
- Nodejs wrapper for Stanford Classifier.☆47Updated 3 years ago
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- Fuzzy Categorical Distances☆14Updated 4 years ago
- conceptnet 4 bridge☆71Updated 10 years ago
- gzipstream allows Python to process multi-part gzip files from a streaming source☆23Updated 7 years ago