Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
☆76Feb 15, 2023Updated 3 years ago
Alternatives and similar repositories for delta-architecture
Users that are interested in delta-architecture are comparing it to the libraries listed below
Sorting:
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆30Updated this week
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆70Aug 27, 2025Updated 6 months ago
- A list of backdoor samples I find online.☆13Dec 16, 2019Updated 6 years ago
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Mar 3, 2026Updated last week
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Jun 7, 2021Updated 4 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- A highly efficient daemon for streaming data from Kafka into Delta Lake☆428May 5, 2025Updated 10 months ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Spark SQL DBF Library☆16Jan 2, 2015Updated 11 years ago
- This repo demonstrates how to use AWS application auto-scaling to implement custom-scaling in your Kinesis Data Analytics for Apache Flin…☆19Feb 21, 2025Updated last year
- ☆63Nov 8, 2019Updated 6 years ago
- Html Content / Article Extractor in Scala☆18May 23, 2018Updated 7 years ago
- ☆36Aug 24, 2022Updated 3 years ago
- Plot live-stats as graph from ApacheSpark application using Lightning-viz☆18Jul 3, 2017Updated 8 years ago
- Experiments with Ooyala's Spark Job Server☆21Dec 14, 2014Updated 11 years ago
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- Poetry plugin for creating docker images. 🏗☆18Mar 2, 2026Updated last week
- Realistic sample value generators for Scala.☆16Jul 4, 2024Updated last year
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Read Delta tables without any Spark☆47Mar 8, 2024Updated 2 years ago
- An experiment to inject a customized parser using SparkSessionExtension☆16Jan 1, 2018Updated 8 years ago
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆22Feb 6, 2017Updated 9 years ago
- Protobuf serialization support for Apache Flink☆21Jun 1, 2021Updated 4 years ago
- Spark SQL index for Parquet tables☆134May 6, 2021Updated 4 years ago
- The Internals of Delta Lake☆188Nov 30, 2025Updated 3 months ago
- Notebooks Python Machine Learning☆20Feb 6, 2020Updated 6 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆50Dec 2, 2023Updated 2 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94May 9, 2025Updated 10 months ago
- Java JNI interface to the TileDB Arrays storage and query engine☆26Jan 24, 2026Updated last month
- A minimal seed template for an Akka gRPC with Scala build☆19Jan 22, 2026Updated last month
- A sink to save Spark Structured Streaming DataFrame into Hive table☆23May 7, 2018Updated 7 years ago
- A tool to validate data, built around Apache Spark.☆101Feb 19, 2026Updated 2 weeks ago
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆251Jul 10, 2025Updated 8 months ago
- ☆25Mar 15, 2024Updated last year
- A streaming key-value store implementation using native Flink Streaming operators☆23Oct 10, 2015Updated 10 years ago
- Kafka as a Datalog Engine☆28Mar 31, 2025Updated 11 months ago
- Open source stack lakehouse☆25Mar 2, 2024Updated 2 years ago