kurlare / spark_streaming_demo
Spark Streaming with Kafka and Wikipedia Edits
☆11Updated 8 years ago
Alternatives and similar repositories for spark_streaming_demo:
Users that are interested in spark_streaming_demo are comparing it to the libraries listed below
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- A collection of airflow sample workflows for data processing on aws☆12Updated 7 years ago
- A short course on the new, experimental features by The Data Incubator and O'Reilly Strata.☆16Updated 8 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 8 years ago
- Supporting content (slides and exercises) for the Addison-Wesley (Pearson) video series covering best practices for developing scalable S…☆66Updated 9 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆22Updated 9 years ago
- Scala: The Unpredicted Lingua Franca for Data Science☆129Updated 6 years ago
- PyGotham 2017: Spark Streaming for World Domination (and other projects)☆10Updated 7 years ago
- Installation guide for Apache Spark + Hadoop on Mac/Linux☆59Updated 7 years ago
- A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB☆94Updated 4 years ago
- A short guide for transitioning from Python to Scala☆65Updated 9 years ago
- Materials for PyData at Strata/Hadoop World San Jose 2015☆12Updated 9 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆89Updated 9 years ago
- ☆15Updated 7 years ago
- A simple Scala Based Project Template for Apache Spark☆22Updated 8 years ago
- Source for "RDDs, DataFrames and Datasets in Apache Spark" NEScala presentation☆15Updated 8 years ago
- Source Material for using Python and Hadoop together☆13Updated 7 years ago
- User-friendly Teradata client for Python☆107Updated 3 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).☆95Updated last year
- Training materials for Strata, AMP Camp, etc☆150Updated 9 years ago
- ELT Code for your Data Warehouse☆26Updated last year
- ☆54Updated 7 years ago
- Content for architecting a data science platform for products using Luigi, Spark & Flask.☆163Updated 5 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆94Updated 6 years ago
- Sample Spark Code☆91Updated 6 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 8 years ago