bagrii / address_extraction
Extracting addresses from text
☆42Updated 6 years ago
Alternatives and similar repositories for address_extraction:
Users that are interested in address_extraction are comparing it to the libraries listed below
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆55Updated last year
- Named Entity Recognition project, which goal is to detect brands from Ebay/Amazon product titles.☆85Updated 7 years ago
- This repository contains an implementation of a US address parser built using spaCy NLP library.☆37Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last month
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆148Updated 3 weeks ago
- Extract dates from text☆64Updated 4 years ago
- Matches a category of Google's Taxonomy to product that is described in any kind of text data☆61Updated 6 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Python address detector and parser☆206Updated last year
- find any kind of occupation or job title in a text or file☆83Updated last year
- Ultimate Website Sitemap Parser☆190Updated this week
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆126Updated 10 months ago
- Train a model, and detect gibberish strings with it.☆60Updated 3 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 7 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- Use ML-Annotate to label data for machine learning purposes☆107Updated 4 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆44Updated 7 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Trying to generate name synonyms from wikidata☆32Updated 4 years ago
- Scrapy pipeline which allows you to store scrapy items in a solr server.☆19Updated 8 years ago
- Index Common Crawl archives in tabular format☆110Updated 3 months ago
- Textpipe: clean and extract metadata from text☆302Updated 3 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated 11 months ago
- How Media Cloud approaches extracting metadata from online news stories☆12Updated 2 months ago
- Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning☆309Updated this week
- A fully customisable language detection pipeline for spaCy☆92Updated 5 years ago
- A helper library full of URL-related heuristics.☆64Updated 4 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- Extract text from HTML☆133Updated 4 years ago