jaumpedro214 / posts
A list of all my posts and personal projects
☆71Updated 11 months ago
Alternatives and similar repositories for posts
Users that are interested in posts are comparing it to the libraries listed below
Sorting:
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆51Updated last year
- DataTalks Workshop Materials☆18Updated last year
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆60Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 9 months ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆76Updated 11 months ago
- ☆87Updated 2 years ago
- ☆130Updated 3 months ago
- Code for dbt tutorial☆157Updated 11 months ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆22Updated 3 years ago
- ☆34Updated last year
- End to end data engineering project☆54Updated 2 years ago
- ☆40Updated 10 months ago
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆68Updated last year
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆70Updated 9 months ago
- A tutorial for the Great Expectations library.☆71Updated 4 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆103Updated 4 years ago
- Sample project to demonstrate data engineering best practices☆190Updated last year
- Project for "Data pipeline design patterns" blog.☆45Updated 9 months ago
- Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in t…☆30Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆93Updated last month
- Data lake, data warehouse on GCP☆56Updated 3 years ago
- Processing TfL data for bike usage with Google Cloud Platform.☆45Updated 2 years ago
- ☆181Updated 4 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆144Updated 4 years ago
- Code for "Advanced data transformations in SQL" free live workshop☆81Updated last week
- ☆87Updated 4 months ago
- how to unit test your PySpark code☆28Updated 4 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆11Updated last year
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆261Updated 10 months ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆45Updated 5 years ago