ksbg / sparklanes
A lightweight data processing framework for Apache Spark
☆16Updated last year
Related projects ⓘ
Alternatives and complementary repositories for sparklanes
- Set up a 3 node spark cluster using docker containers☆33Updated 6 years ago
- Helping you get Airflow running in production.☆9Updated 5 years ago
- Code for my presentation: Using PySpark to Process Boat Loads of Data☆20Updated 7 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course I gave to one of our clients in Dece…☆10Updated 8 years ago
- Simple demonstration of how to build a complex real time machine learning visualization tool.☆16Updated 8 years ago
- PySpark phonetic and string matching algorithms☆35Updated 9 months ago
- Developing a Lambda Architecture pipeline using Apache Kafka, Spark Structured Streaming, Redshift, S3, Python☆24Updated 4 years ago
- ☆25Updated 5 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 4 years ago
- Repository used for Spark Trainings☆53Updated last year
- Model management example using Polyaxon, Argo and Seldon☆23Updated 6 years ago
- Updated repository☆157Updated 2 years ago
- Docker container for Kafka - Spark Streaming - Cassandra☆97Updated 5 years ago
- Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References☆70Updated 5 years ago
- Airflow training for the crunch conf☆105Updated 6 years ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆46Updated last year
- How to do data science with Optimus, Spark and Python.☆18Updated 5 years ago
- Learn the pyspark API through pictures and simple examples☆168Updated 3 years ago
- A simple introduction to using spark ml pipelines☆26Updated 6 years ago
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆83Updated 4 years ago
- Real-time report dashboard with Apache Kafka, Apache Spark Streaming and Node.js☆49Updated last year
- A curated list of articles, papers and tools for managing the building and deploying of machine learning models, aka machine learning eng…☆18Updated 6 years ago
- Code for Packt Publishing's Spark for Data Science Cookbook.☆22Updated 7 years ago
- Asynchronous actions for PySpark☆45Updated 2 years ago
- Workshop for Spark and Databricks☆54Updated 4 years ago
- Use Airflow to move data from multiple MySQL databases to BigQuery☆99Updated 4 years ago
- Repo for all my code on the articles I post on medium☆105Updated 2 years ago