supermariolabs / spooq
☆38Updated 9 months ago
Alternatives and similar repositories for spooq:
Users that are interested in spooq are comparing it to the libraries listed below
- Code snippets used in demos recorded for the blog.☆29Updated last week
- Witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. The Starter Kit showcases the …☆21Updated this week
- Avro Schema Evolution made easy☆34Updated last year
- An open specification for data products in Data Mesh☆55Updated 3 months ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆79Updated this week
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆30Updated last week
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated 11 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 9 months ago
- Data quality control tool built on spark and deequ☆24Updated 4 months ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆41Updated 4 months ago
- Flowchart for debugging Spark applications☆104Updated 4 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated last week
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Materials (slides and code) for Kafka and Kafka Streams Workshops☆61Updated 8 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated last week
- Testing Scala code with scalatest☆12Updated 2 years ago
- Dione - a Spark and HDFS indexing library☆51Updated 11 months ago
- The official repository for the Rock the JVM Spark Optimization 2 course☆38Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated 11 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- a curated list of awesome lakehouse frameworks, applications, etc☆21Updated last month
- Kafka Examples repository.☆44Updated 6 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Nested array transformation helper extensions for Apache Spark☆37Updated last year
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆118Updated last week
- ☆10Updated 2 years ago