paypay / DataEngineerChallengeLinks
☆23Updated 3 years ago
Alternatives and similar repositories for DataEngineerChallenge
Users that are interested in DataEngineerChallenge are comparing it to the libraries listed below
Sorting:
- A Python PySpark Projet with Poetry☆24Updated 6 months ago
- Repository used for Spark Trainings☆54Updated 2 years ago
- Magic to help Spark pipelines upgrade☆34Updated last year
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Updated 4 years ago
- Data validation library for PySpark 3.0.0☆33Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56Updated 2 years ago
- Airflow training for the crunch conf☆105Updated 7 years ago
- Weekly Data Engineering Newsletter☆96Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆227Updated 2 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated last year
- Spark Examples☆127Updated 4 years ago
- ☆12Updated 3 years ago
- Code snippets used in demos recorded for the blog.☆37Updated 3 weeks ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆420Updated this week
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 5 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆90Updated 4 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆58Updated 2 years ago
- PySpark phonetic and string matching algorithms☆41Updated last year
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆175Updated 8 months ago
- Spark style guide☆271Updated last year
- Repository of sample Databricks notebooks☆277Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆42Updated 3 years ago
- Snowflake Guide: Building a Recommendation Engine Using Snowflake & Amazon SageMaker☆32Updated 4 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Updated 7 years ago
- Filling in the Spark function gaps across APIs☆50Updated 4 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 6 years ago
- The official repository for the Rock the JVM Spark Optimization 2 course☆42Updated 2 years ago
- An example PySpark project with pytest☆18Updated 8 years ago
- ☆314Updated 7 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 3 years ago