pjlsergeant / zipripLinks
Extract postal addresses from the DOM
☆66Updated 12 years ago
Alternatives and similar repositories for ziprip
Users that are interested in ziprip are comparing it to the libraries listed below
Sorting:
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- gzipstream allows Python to process multi-part gzip files from a streaming source☆23Updated 8 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago
- Keeps a mirror of DBpedia live in sync☆26Updated 3 years ago
- Client for Stanford Named Entity Reconginiton☆27Updated 6 years ago
- ☆24Updated 9 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.☆119Updated 3 years ago
- Fuzzy Categorical Distances☆14Updated 5 years ago
- A node.js library for extracting data from scanned forms.☆117Updated 2 years ago
- Free & ready-to-use geocoder☆57Updated 8 years ago
- ☆21Updated 7 years ago
- Dedupe/batch geocode addresses and venues around the world with libpostal☆83Updated 3 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 3 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆167Updated 3 years ago
- ☆50Updated 4 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 8 years ago
- Search for similar short strings☆52Updated 4 years ago
- Mechanical Turk on your own machine.☆206Updated 7 months ago
- ☆13Updated 8 years ago
- ☆14Updated 8 years ago
- Server endpoint for communicating with stanford-ner server☆25Updated 7 years ago