arezamoosavi/AcidOnSpark-ETL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/arezamoosavi/AcidOnSpark-ETL)

arezamoosavi / AcidOnSpark-ETL

Delta-Lake, ETL, Spark, Airflow

☆50

Alternatives and similar repositories for AcidOnSpark-ETL

Users that are interested in AcidOnSpark-ETL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

andrejnevesjr / airflow-spark-minio-postgres
View on GitHub
Project with Airflow + Spark + MinIO + Postgres + Python3.8
☆29Sep 9, 2022Updated 3 years ago
yTek01 / docker-spark-airflow
View on GitHub
☆41Jan 24, 2023Updated 3 years ago
mrn-aglic / apache-iceberg-data-exploration
View on GitHub
☆23Feb 5, 2024Updated 2 years ago
dacort / modern-data-lake-storage-layers
View on GitHub
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
☆47Jul 13, 2022Updated 4 years ago
bennylope / smartystreets.py
View on GitHub
A better SmartyStreets/LiveAddress API library for Python
☆12Jan 2, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
treeverse / lakeFS-hooks
View on GitHub
a simple lakeFS webhook for pre-commit and pre-merge validation of data objects
☆13Nov 9, 2023Updated 2 years ago
mehd-io / parquet-info-firefox-extension
View on GitHub
Firefox extension that shows parquet schema when going over GCP cloud storage. Use DuckDB WASM
☆12Jan 19, 2024Updated 2 years ago
BauplanLabs / no-jvm-wap-with-iceberg
View on GitHub
A write-audit-publish implementation on a data lake without the JVM
☆45Aug 12, 2024Updated last year
analyticsdurgesh / StreamCommerce-Lakehouse-360
View on GitHub
Production-style real-time e-commerce lakehouse with Kafka, Airflow, Databricks, Medallion architecture, data quality, quarantine, Terraf…
☆31May 30, 2026Updated last month
mage-ai / magic-devcontainer
View on GitHub
A demo instance of mage for pulling sample data from a public Google pub/sub topic and transforming with dbt.
☆12Jan 5, 2024Updated 2 years ago
danthelion / trino-minio-iceberg-example
View on GitHub
☆42Jul 4, 2022Updated 4 years ago
knowsuchagency / orkestra
View on GitHub
The elegance of Airflow + the power of AWS
☆51Feb 5, 2024Updated 2 years ago
implydata / druid-datagenerator
View on GitHub
A data generator for Apache Druid
☆12Mar 26, 2025Updated last year
pran4ajith / spark-twitter-streaming
View on GitHub
A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…
☆29Aug 8, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
itversity / data-engineering-spark
View on GitHub
☆95Sep 14, 2022Updated 3 years ago
adswerve / google_analytics_flattener
View on GitHub
Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…
☆16Apr 13, 2026Updated 3 months ago
danielbeach / lakescum
View on GitHub
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
☆27Mar 25, 2024Updated 2 years ago
luongphambao / nyc-taxi-feature-store
View on GitHub
☆57Aug 14, 2024Updated last year
mozilla / docker-etl
View on GitHub
Collection of dockerized ETL jobs managed by data engineering.
☆23Jul 22, 2026Updated last week
arempter / hive-metastore-docker
View on GitHub
Example for article Running Spark 3 with standalone Hive Metastore 3.0
☆100Jan 31, 2023Updated 3 years ago
Akrog / gcs-client
View on GitHub
Google Cloud Storage Python Client
☆14Dec 26, 2022Updated 3 years ago
ion-bostanica / spark-minio-delta-lakehouse-docker
View on GitHub
A minimal docker compose setup for experimenting with cloud agnostic Lakehouse Architectures Apache Spark with Hive Metastore + Delta Lak…
☆34Apr 17, 2024Updated 2 years ago
quixio / streaming-academy
View on GitHub
☆10Jul 24, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SweetAdjPotato / machine-learning-algorithms-with-without-libraries
View on GitHub
☆11Mar 7, 2021Updated 5 years ago
FrancescoXX / flask-crud-live
View on GitHub
☆24Jul 24, 2024Updated 2 years ago
rpkilby / SurveyGizmo
View on GitHub
Wrapper for SurveyGizmo's restful API service
☆16Sep 24, 2020Updated 5 years ago
mgustineli / DSA-Python-Book
View on GitHub
Repo for storing my solutions for the exercises in the Book "Data Structures and Algorithms in Python" by Goodrich, Tamassia, and Goldwas…
☆19Apr 11, 2022Updated 4 years ago
mcastellin / yt-docker-tricks-examples
View on GitHub
A repository to store example files and projects for my YouTube series **Docker Development Tips & Tricks**
☆13Dec 1, 2021Updated 4 years ago
cordon-thiago / airflow-spark
View on GitHub
Docker with Airflow and Spark standalone cluster
☆264Aug 5, 2023Updated 2 years ago
trannhatnguyen2 / NYC_Taxi_Data_Pipeline
View on GitHub
Nyc_Taxi_Data_Pipeline - DE Project
☆151Oct 21, 2024Updated last year
alexdebrie / serverless-dynamodb-scanner
View on GitHub
A Serverless project to help you operate on every existing item in a DynamoDB table
☆17Mar 5, 2019Updated 7 years ago
dmatrix / ray-core-serve-tutorial-mlops
View on GitHub
A two part tutorial for Ray Core APIs and Ray Serve for Model Deployment
☆21Jun 9, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
The-Academic-Observatory / academic-observatory-workflows
View on GitHub
Telescopes, Workflows and Data Services for the Academic Observatory
☆18Jul 21, 2026Updated last week
elyra-ai / airflow-notebook
View on GitHub
This repository is no longer maintained.
☆15Mar 10, 2022Updated 4 years ago
bytehouse-cloud / driver-go
View on GitHub
High Performance Go Driver for Bytehouse
☆15Jun 11, 2025Updated last year
AnaisUrlichs / react-article-display
View on GitHub
☆15Nov 16, 2023Updated 2 years ago
soumilshah1995 / universal-datalakehouse-postgres-ingestion-deltastreamer
View on GitHub
universal-datalakehouse-postgres-ingestion-deltastreamer
☆10Apr 7, 2024Updated 2 years ago
Blue9 / Summarizer
View on GitHub
An automatic paraphraser/summarizer/information extractor built using Python.
☆18Apr 1, 2016Updated 10 years ago
dunghoang369 / feature-store
View on GitHub
☆73Jan 16, 2024Updated 2 years ago