jfchen / Spark-SQL-Twitter-Analyzer
Process large amount of Twitter data using Spark SQL (and its JSON support). Answers questions like "What are the most popular languages?", "Who is most influential?", "Which time zones are most active during a day?" and more.
☆9Updated 9 years ago
Alternatives and similar repositories for Spark-SQL-Twitter-Analyzer:
Users that are interested in Spark-SQL-Twitter-Analyzer are comparing it to the libraries listed below
- Real-time dashboard for Twitter Sentiment analysis using Spark Streaming and Watson Tone Analyzer☆31Updated 6 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Film recommendations with Apache Spark and Python☆61Updated 9 years ago
- tutorials and samples that show you how get the most out of IBM Analytics for Apache Spark☆79Updated 7 years ago
- The repository for the CMU Data Pipeline course. This year's course should use branch 2017☆40Updated 7 years ago
- Machine Learning over Twitter's stream. Using Apache Spark, Web Server and Lightning Graph server.☆27Updated 8 years ago
- GPU Acceleration for Apache Spark☆34Updated 9 years ago
- Tutorial for Deploying Anaconda Cluster and PySpark on top of Red Hat Storage GlusterFS☆8Updated 10 years ago
- ☆48Updated 8 years ago
- Spark Tutorial at the University of Maryland☆38Updated 10 years ago
- Examples of Integrating Spark Streaming, Flume, and HBase to solve Streaming problems☆19Updated 11 years ago
- ☆26Updated last year
- A real time streaming implementation of markov chain based fraud detection☆23Updated 10 years ago
- Additional useful algorithms that can be used with spark.☆24Updated 10 years ago
- Assembly of fundamental statistics implemented based on Apache Spark☆31Updated 9 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- Page Rank, Inverted Index and Matrix Multiplication☆9Updated 7 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- Public code files for the DDL blog☆56Updated 6 years ago
- Spark in Kaggle competitions☆9Updated 9 years ago
- DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.☆44Updated 10 years ago
- ☆35Updated 2 years ago
- A subproject of Predictiveworks that provides common access to Cassandra, Elasticsearch, HBase, MongoDB, Parquet, JDBC database and other…☆13Updated 10 years ago
- Coding exercises for Apache Spark☆104Updated 9 years ago
- Training materials for Strata, AMP Camp, etc☆149Updated 9 years ago
- PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)☆13Updated 10 years ago
- Data and code for "Fast Data Applications with Spark and Python"☆25Updated 8 years ago
- Social Media Data Mining and Analytics - HyperLogLog, BloomFilter and CountMinSketch with Scalding & Algebird☆27Updated 6 years ago
- This repository contains code files specifically IPython notebooks for the assignments in the course "Scalable Machine Learning" by UC Be…☆30Updated 9 years ago
- Kaggle's click through rate prediction with Spark Pipeline API☆23Updated 9 years ago