cldellow / real-estate-prices-cc
Source real estate prices from the Common Crawl.
☆27Updated 6 years ago
Alternatives and similar repositories for real-estate-prices-cc:
Users that are interested in real-estate-prices-cc are comparing it to the libraries listed below
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 8 years ago
- ☆11Updated 5 years ago
- Scraping Assisted by Learning☆35Updated 2 weeks ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Language-agnostic political event coding using universal dependencies☆18Updated 5 years ago
- Scraping Tweet data for Russian Troll Twitter accounts into Neo4j☆57Updated 7 years ago
- Train a neural network optimized for generating Reddit subreddit posts☆29Updated 7 years ago
- Turning news into events since 2014.☆51Updated 8 years ago
- Jupyter notebook + Code for reproducing Reddit Subreddit graphs☆18Updated 8 years ago
- classify a job description (or noisy job title) into a ONET job title☆19Updated 8 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 8 years ago
- Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence☆64Updated last year
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- ☆34Updated last year
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Package for performing Reddit-based text analysis☆22Updated 6 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Code and visualizations for related/similar subreddits☆19Updated 8 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Materials to reproduce findings in our story, "Google’s Top Search Result? Surprise! It’s Google"☆34Updated 4 years ago
- LinkRun - Data Engineering project done in 3 weeks during the Insight fellowship☆38Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- ☆16Updated 6 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Jupyter notebook + Code for scraping AngelList data and making an interactive chart of SFBA salaries/equity☆14Updated 8 years ago
- Events and Situations Ontology☆14Updated 7 years ago