Delta-Lake, ETL, Spark, Airflow
☆49Oct 9, 2022Updated 3 years ago
Alternatives and similar repositories for AcidOnSpark-ETL
Users that are interested in AcidOnSpark-ETL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Oct 10, 2025Updated 6 months ago
- ☆13May 11, 2025Updated 11 months ago
- ☆22Feb 5, 2024Updated 2 years ago
- ☆271Oct 23, 2024Updated last year
- Spark and Hive docker containers sharing a common MySQL metastore☆26Apr 17, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.☆22Jun 13, 2024Updated last year
- A better SmartyStreets/LiveAddress API library for Python☆12Jan 2, 2025Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆45Aug 12, 2024Updated last year
- Firefox extension that shows parquet schema when going over GCP cloud storage. Use DuckDB WASM☆12Jan 19, 2024Updated 2 years ago
- A demo instance of mage for pulling sample data from a public Google pub/sub topic and transforming with dbt.☆12Jan 5, 2024Updated 2 years ago
- ☆41Jul 4, 2022Updated 3 years ago
- The elegance of Airflow + the power of AWS☆51Feb 5, 2024Updated 2 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- ☆10Jul 21, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Collection of dockerized ETL jobs managed by data engineering.☆22Updated this week
- A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.☆27Mar 25, 2024Updated 2 years ago
- ☆14Updated this week
- ☆11Mar 7, 2021Updated 5 years ago
- Repository for building docker image, with open-source applications☆26Apr 23, 2024Updated last year
- RedditR for Content Engagement and Recommendation☆18Dec 21, 2017Updated 8 years ago
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆79Feb 27, 2023Updated 3 years ago
- ☆24Jul 24, 2024Updated last year
- This repository contains makescript and instruction on how to setup local hdfs+spark+hive setup.☆19Jul 29, 2016Updated 9 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Wrapper for SurveyGizmo's restful API service☆16Sep 24, 2020Updated 5 years ago
- Nyc_Taxi_Data_Pipeline - DE Project☆140Oct 21, 2024Updated last year
- Ansible playbooks for Apache Spark on kube☆27Jul 20, 2017Updated 8 years ago
- A minimal docker compose setup for experimenting with cloud agnostic Lakehouse Architectures Apache Spark with Hive Metastore + Delta Lak…☆34Apr 17, 2024Updated 2 years ago
- Data Engineering Projects using Mage.ai as orchestrator☆18Jan 20, 2026Updated 2 months ago
- A two part tutorial for Ray Core APIs and Ray Serve for Model Deployment☆21Jun 9, 2022Updated 3 years ago
- Telescopes, Workflows and Data Services for the Academic Observatory☆18Updated this week
- Repositório com um tutorial simples e claro de Polars, biblioteca de análise de dados no Python, uma alternativa ao Pandas.☆40Feb 24, 2023Updated 3 years ago
- High Performance Go Driver for Bytehouse☆14Jun 11, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Examples and code to assign a name to your MongoDB, MySQL, NATS, Oracle, PostgreSQL, RabbitMQ, and redis connection.☆28Updated this week
- ☆15Nov 16, 2023Updated 2 years ago
- ☆16Jun 5, 2023Updated 2 years ago
- ☆17Jun 8, 2025Updated 10 months ago
- universal-datalakehouse-postgres-ingestion-deltastreamer☆11Apr 7, 2024Updated 2 years ago
- Generate and Compare Debezium CDC (Chance Data Capture) Avro Schema, directly from your Database.☆25Updated this week
- A partially implemented ODBC driver for the Trino distributed SQL engine☆18Feb 2, 2026Updated 2 months ago