Spark all the ETL Pipelines
☆37Aug 2, 2023Updated 2 years ago
Alternatives and similar repositories for SparkETL
Users that are interested in SparkETL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Jun 3, 2023Updated 2 years ago
- ☆19Jul 17, 2021Updated 4 years ago
- Streaming analytics project with eventsim and Kafka☆13Dec 23, 2022Updated 3 years ago
- Clickstream Faker Provider for Python.☆11Apr 2, 2022Updated 3 years ago
- Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and tr…☆11May 25, 2023Updated 2 years ago
- Identify and tokenize sensitive data automatically using Cloud DLP and Dataflow☆45Oct 27, 2025Updated 4 months ago
- ☆22Jan 22, 2018Updated 8 years ago
- In-browser data analysis using SQL | Powered by duckdb-wasm☆26Dec 21, 2025Updated 3 months ago
- This is the final project that after participated the Data Engineering Zoomcamp☆11Apr 4, 2022Updated 3 years ago
- My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraform☆14Oct 12, 2022Updated 3 years ago
- ☆16Mar 9, 2026Updated 2 weeks ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆30Mar 9, 2026Updated 2 weeks ago
- An operator for managing Alluxio system on Kubernetes cluster☆13Jan 9, 2024Updated 2 years ago
- Distributed System in Docker with Apache Kafka and Spark for big data streaming and visualisation (NodeJS, TypeScript, React, NestJS, Jav…☆24Apr 28, 2019Updated 6 years ago
- My *nix dotfiles☆12Jul 4, 2025Updated 8 months ago
- DuckDB Copilot Extension☆10Jan 12, 2026Updated 2 months ago
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆16Jan 4, 2026Updated 2 months ago
- Copy My Writing is a command-line tool for generating content based on your personal writing style.☆11Oct 12, 2025Updated 5 months ago
- Get map value via dot-delimited path or nil.☆30Sep 9, 2014Updated 11 years ago
- Calico API☆23Updated this week
- Source code of the institutional insights TradingView indicator.☆10Jan 30, 2025Updated last year
- ☆12Aug 26, 2024Updated last year
- Zabbix Template (>2.4) and resources useful to monitor zfs on linux (zpool)☆13Jan 26, 2017Updated 9 years ago
- CLI secret management☆16Updated this week
- Apache Polaris Tools, additional tooling for Apache Polaris☆25Mar 16, 2026Updated last week
- Contains spark dataframe solutions of leetcode questions☆24Dec 13, 2022Updated 3 years ago
- Open source package for Survival Analysis modeling☆23Feb 3, 2020Updated 6 years ago
- A foreign data wrapper for PostgreSQL allowing easy accessing of Apache ORC formatted data files.☆11Sep 21, 2020Updated 5 years ago
- OpenKruise Helm Charts.☆16Mar 10, 2026Updated 2 weeks ago
- Automated TPC-DS and TPC-H benchmark for Apache Hive LLAP☆10Jul 18, 2022Updated 3 years ago
- Source code for TPCx-BB benchmark for Hive and SparkSQL on scale factor of 300 GB☆10Jun 26, 2018Updated 7 years ago
- Stardog Visual Studio Code Extensions☆17Jul 10, 2025Updated 8 months ago
- type-safe event bus library for Go with full lifecycle☆13Sep 26, 2025Updated 5 months ago
- Bigdata on Kubernetes, Published by Packt☆36Oct 1, 2024Updated last year
- ☆10Jan 28, 2025Updated last year
- A high-performance PDF summarization tool powered by Google's Gemma 3 LLM. Features parallel processing, async operations, and intelligen…☆20Apr 12, 2025Updated 11 months ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆91Jun 25, 2023Updated 2 years ago
- ☆13Mar 9, 2026Updated 2 weeks ago
- Run ansible-lint with reviewdog 🐕☆16Jan 22, 2026Updated 2 months ago