jfchen / Spark-SQL-Twitter-Analyzer
Process large amount of Twitter data using Spark SQL (and its JSON support). Answers questions like "What are the most popular languages?", "Who is most influential?", "Which time zones are most active during a day?" and more.
☆9Updated 10 years ago
Alternatives and similar repositories for Spark-SQL-Twitter-Analyzer:
Users that are interested in Spark-SQL-Twitter-Analyzer are comparing it to the libraries listed below
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- A real time streaming implementation of markov chain based fraud detection☆23Updated 10 years ago
- Tutorial for Deploying Anaconda Cluster and PySpark on top of Red Hat Storage GlusterFS☆8Updated 10 years ago
- Spark Tutorial at the University of Maryland☆38Updated 10 years ago
- Assembly of fundamental statistics implemented based on Apache Spark☆31Updated 9 years ago
- PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)☆13Updated 10 years ago
- Spark in Kaggle competitions☆9Updated 9 years ago
- Film recommendations with Apache Spark and Python☆61Updated 9 years ago
- Fast-Data-Processing-with-Spark-2☆22Updated 2 years ago
- ☆41Updated 8 years ago
- A spark sbt blueprint to build your own spark apps off of (for cloud native runtime, see the kube/spark examples)☆56Updated 5 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 9 years ago
- Real-time dashboard for Twitter Sentiment analysis using Spark Streaming and Watson Tone Analyzer☆31Updated 6 years ago
- Kaggle's click through rate prediction with Spark Pipeline API☆23Updated 9 years ago
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 9 years ago
- Public code files for the DDL blog☆56Updated 6 years ago
- Building blocks and patterns for building data prep transformations and feature engineering in Spark.☆16Updated 9 years ago
- Parallel Iterative Algorithm (SGD) on Hadoop's YARN framework☆42Updated 12 years ago
- A subproject of Predictiveworks that provides common access to Cassandra, Elasticsearch, HBase, MongoDB, Parquet, JDBC database and other…☆13Updated 10 years ago
- Additional files for the Otto Group Challenge hosted by Kaggle☆37Updated 10 years ago
- ☆13Updated 9 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- Data and code for "Fast Data Applications with Spark and Python"☆25Updated 8 years ago
- An API for Distributed Machine Learning☆154Updated 8 years ago
- ☆20Updated 8 years ago
- ☆35Updated 8 years ago
- A Latent Dirichlet Allocation topic modeling package based on SparseLDA Gibbs Sampling inference algorithm☆8Updated 12 years ago
- Kirk's Zeppelin Notebooks☆12Updated 6 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- graphx example☆24Updated 9 years ago