cldellow / real-estate-prices-cc
Source real estate prices from the Common Crawl.
☆27Updated 6 years ago
Alternatives and similar repositories for real-estate-prices-cc:
Users that are interested in real-estate-prices-cc are comparing it to the libraries listed below
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 8 years ago
- classify a job description (or noisy job title) into a ONET job title☆19Updated 8 years ago
- Language-agnostic political event coding using universal dependencies☆18Updated 5 years ago
- Tool for sentiment analysis annotation☆12Updated 5 months ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- Scraping Assisted by Learning☆35Updated this week
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 8 years ago
- Parsing resumes in a PDF format from linkedIn☆68Updated 8 years ago
- LinkRun - Data Engineering project done in 3 weeks during the Insight fellowship☆38Updated 4 years ago
- Predict age and gender from a first name☆60Updated 6 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated 3 weeks ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆148Updated last month
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Broad crawler for domain discovery☆19Updated 6 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- [archived]☆18Updated 3 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- Using ML to extract campaign finance data from messy forms for journalism☆76Updated 2 years ago
- R code needed to reproduce Relationship between Reddit Comment Score and Comment Length for 1.66 Billion Comments visualization☆18Updated 9 years ago
- Scrapes sites. Gets news. Eventually events.☆84Updated 9 years ago
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- A search engine for Open Data☆53Updated 2 years ago
- Machine learning model to recommend related content☆19Updated last year
- ☆16Updated 6 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- ☆12Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Techniques for Scraping the Web in Python☆26Updated 6 years ago