Scala API for Apache Spark SQL high-order functions
☆14Aug 4, 2023Updated 2 years ago
Alternatives and similar repositories for spark-hofs
Users that are interested in spark-hofs are comparing it to the libraries listed below
Sorting:
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Make Structs Easy (MSE)☆18Jun 22, 2020Updated 5 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Dynamic Conformance Engine☆32Oct 17, 2025Updated 4 months ago
- Resilient data pipeline framework running on Apache Spark☆26Updated this week
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- Apache Spark ETL Utilities☆39Oct 23, 2024Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- an open source dataworks platform☆21Jun 4, 2021Updated 4 years ago
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆22Feb 6, 2017Updated 9 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- Marquez Web UI☆21Nov 13, 2020Updated 5 years ago
- Utilities for writing tests that use Apache Spark.☆24Dec 29, 2018Updated 7 years ago
- An open source enterprise data warehousing and analysis platform.☆22Nov 8, 2021Updated 4 years ago
- Avro SerDe for Apache Spark structured APIs.☆241Jun 10, 2025Updated 8 months ago
- Build configuration-driven ETL pipelines on Apache Spark☆161Oct 4, 2022Updated 3 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Jun 7, 2021Updated 4 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆122Updated this week
- 提供清晰、实用的Akka应用指导☆31Jan 17, 2022Updated 4 years ago
- Nested array transformation helper extensions for Apache Spark☆37Aug 4, 2023Updated 2 years ago
- Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。☆33Apr 12, 2022Updated 3 years ago
- Friendly, Scala like, Sequence interface☆12Jan 13, 2026Updated last month
- An adhoc reporting client based on Pentaho Metadata Layer☆32Mar 20, 2013Updated 12 years ago
- Command-line tool to find the nearest retail store☆10Jan 18, 2017Updated 9 years ago
- ☆13Nov 10, 2025Updated 3 months ago
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆251Jul 10, 2025Updated 7 months ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆43Feb 11, 2026Updated 2 weeks ago
- A collection of Apache Parquet add-on modules☆30Feb 12, 2026Updated 2 weeks ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆104Jan 22, 2024Updated 2 years ago
- Common utilities for Apache Kafka☆36Aug 7, 2023Updated 2 years ago
- ☆10Aug 13, 2021Updated 4 years ago
- Kafka Connect JSONata Transform☆12Feb 24, 2025Updated last year
- Repo to hold code Artifacts for WAF☆10Sep 14, 2022Updated 3 years ago
- Python JDBC Connector☆11Aug 9, 2019Updated 6 years ago
- 支持分库分表jdbc的flink connector☆10Dec 31, 2021Updated 4 years ago