kadnan / docker-spark-cluster
A simple spark standalone cluster for your testing environment purposses
☆23Updated 4 years ago
Alternatives and similar repositories for docker-spark-cluster:
Users that are interested in docker-spark-cluster are comparing it to the libraries listed below
- Flowchart for debugging Spark applications☆104Updated 4 months ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆57Updated last year
- Repository used for Spark Trainings☆53Updated last year
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 4 years ago
- Weekly Data Engineering Newsletter☆94Updated 7 months ago
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- The official repository for the Rock the JVM Spark Optimization 2 course☆38Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 9 months ago
- Spark functions to run popular phonetic and string matching algorithms☆60Updated 2 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- scaffold of Apache Airflow executing Docker containers☆85Updated 2 years ago
- Filling in the Spark function gaps across APIs☆50Updated 3 years ago
- ☆63Updated 5 years ago
- Use Airflow to move data from multiple MySQL databases to BigQuery☆100Updated 4 years ago
- Benchmark data warehouses under Fivetran-like conditions☆165Updated 2 years ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆74Updated 2 years ago
- ☆198Updated last year
- RedditR for Content Engagement and Recommendation☆21Updated 7 years ago
- ☆72Updated 3 years ago
- How to build an awesome data engineering team☆99Updated 5 years ago
- Examples To Help You Learn Apache Spark☆77Updated 6 years ago
- ☆245Updated 5 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Template for Spark Projects☆101Updated 9 months ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- Repository of sample Databricks notebooks☆254Updated 10 months ago
- Real-world Spark pipelines examples☆83Updated 6 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Sample Spark Code☆91Updated 6 years ago