pjlsergeant / ziprip
Extract postal addresses from the DOM
☆66Updated 12 years ago
Alternatives and similar repositories for ziprip:
Users that are interested in ziprip are comparing it to the libraries listed below
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- ☆24Updated 9 years ago
- ☆21Updated 6 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 7 years ago
- email dataset for email signature parsing☆55Updated 8 years ago
- Mechanical Turk on your own machine.☆206Updated 6 months ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆253Updated last year
- A Python canonicalizer to disambiguate and recognize known names from a poor quality data entry list.☆20Updated 9 years ago
- A language detection Web Service☆53Updated 7 years ago
- Open Source implementation of Summly☆47Updated 8 years ago
- Client for Stanford Named Entity Reconginiton☆27Updated 6 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆166Updated 3 years ago
- conceptnet 4 bridge☆71Updated 10 years ago
- Read natural language interactive queries. Great for bots.☆18Updated 8 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 10 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- Nodejs text sumarization☆54Updated 11 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated last month
- A node.js library for extracting data from scanned forms.☆117Updated 2 years ago
- Fuzzy Categorical Distances☆14Updated 5 years ago
- Dedupe/batch geocode addresses and venues around the world with libpostal☆82Updated 3 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago
- python library for extracting html microdata☆166Updated last year
- Modularly extensible semantic metadata validator☆84Updated 9 years ago
- Akiva is a simple natural-language-processing, question-answering, artificial intelligence.☆348Updated 11 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- Freeform Street Address Parser☆95Updated 2 years ago