Extensible streaming ingestion pipeline on top of Apache Spark
☆46Jul 17, 2025Updated 7 months ago
Alternatives and similar repositories for hyperdrive
Users that are interested in hyperdrive are comparing it to the libraries listed below
Sorting:
- Dynamic Conformance Engine☆32Oct 17, 2025Updated 4 months ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Resilient data pipeline framework running on Apache Spark☆26Updated this week
- Avro SerDe for Apache Spark structured APIs.☆241Jun 10, 2025Updated 8 months ago
- Data quality tools for Big Data☆19Oct 10, 2019Updated 6 years ago
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- A JDBC streaming source for Spark☆10Feb 19, 2024Updated 2 years ago
- Spark app to merge different schemas☆23Dec 21, 2020Updated 5 years ago
- Data Lineage Tracking And Visualization Solution☆656Feb 16, 2026Updated last week
- Efficiently automate your release note generation with 'generate-release-notes'. This GH action scans your target GitHub repository's iss…☆12Updated this week
- Small wrapper to enable running arbitrary docker run commands in nomad.☆10Oct 28, 2017Updated 8 years ago
- ☆15Nov 20, 2024Updated last year
- EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Goo…☆44Aug 26, 2024Updated last year
- Package to extend Airflow functionality with CWL v1.0 support☆12Jun 12, 2019Updated 6 years ago
- Cloud based Data Platform based on Apache Spark☆27Feb 17, 2026Updated last week
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆251Jul 10, 2025Updated 7 months ago
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆31Oct 28, 2025Updated 3 months ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 4 months ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- ☆16Apr 9, 2019Updated 6 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Jul 11, 2018Updated 7 years ago
- Hadoop utility to compact small files☆18Feb 16, 2026Updated last week
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 4 years ago
- Single node, in-memory DataFrame analytics library.☆43Sep 15, 2025Updated 5 months ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆17Aug 8, 2019Updated 6 years ago
- Deriving Spark DataFrame schemas from case classes☆44Jun 24, 2024Updated last year
- Spark SQL index for Parquet tables☆134May 6, 2021Updated 4 years ago
- A Scala Kubernetes client library☆89Aug 11, 2025Updated 6 months ago
- spark job, sangria server, and react front-end for Word2Vec models☆15Nov 1, 2016Updated 9 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- 🚀 Validation DSL for data pipelines☆24Jun 12, 2018Updated 7 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94May 9, 2025Updated 9 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆303Oct 30, 2025Updated 3 months ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago