Extensible streaming ingestion pipeline on top of Apache Spark
☆46Jul 17, 2025Updated 8 months ago
Alternatives and similar repositories for hyperdrive
Users that are interested in hyperdrive are comparing it to the libraries listed below
Sorting:
- Resilient data pipeline framework running on Apache Spark☆26Updated this week
- Dynamic Conformance Engine☆32Oct 17, 2025Updated 5 months ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- A JDBC streaming source for Spark☆10Feb 19, 2024Updated 2 years ago
- Avro SerDe for Apache Spark structured APIs.☆241Jun 10, 2025Updated 9 months ago
- R COBOL DI (Data Integration) Package : Import COBOL CopyBook data files directly into R as properly structured data frames.☆15Aug 7, 2024Updated last year
- Nested array transformation helper extensions for Apache Spark☆37Aug 4, 2023Updated 2 years ago
- ☆16Apr 9, 2019Updated 6 years ago
- Task Metrics Explorer☆14Apr 2, 2019Updated 6 years ago
- A simple Spark-powered ETL framework that just works 🍺☆185Oct 2, 2025Updated 5 months ago
- EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Goo…☆44Aug 26, 2024Updated last year
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Jul 11, 2018Updated 7 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 5 months ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆251Jul 10, 2025Updated 8 months ago
- Open Source Secret Provider plugin for the Kafka Connect framework☆47Jul 19, 2024Updated last year
- Spark SQL index for Parquet tables☆134May 6, 2021Updated 4 years ago
- Search inside Snowden/NSA/GCHQ/whatever documents☆11Jul 11, 2014Updated 11 years ago
- Data quality tools for Big Data☆19Oct 10, 2019Updated 6 years ago
- Gather system information about airflow processes☆18Mar 12, 2020Updated 6 years ago
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆31Oct 28, 2025Updated 4 months ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- A solver for GCHQ's christmas card puzzle☆12Dec 14, 2015Updated 10 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Friendly, Scala like, Sequence interface☆12Jan 13, 2026Updated 2 months ago
- ☆16Jun 27, 2020Updated 5 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆63Nov 8, 2019Updated 6 years ago
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated 11 months ago
- The dbt-spark-livy adapter allows you to use dbt along with Apache Spark, by connecting via Apache Livy☆12Mar 30, 2023Updated 2 years ago
- A script to automate and simplify simple system tasks, such as service control, package control, system monitoring, pinging etc. This scr…☆10Nov 27, 2022Updated 3 years ago
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- Package to extend Airflow functionality with CWL v1.0 support☆12Jun 12, 2019Updated 6 years ago
- ☆11Oct 11, 2022Updated 3 years ago
- GraalVM native-image as a docker container☆13Oct 11, 2018Updated 7 years ago
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆12May 2, 2021Updated 4 years ago