Apache Spark ETL Utilities
☆39Oct 23, 2024Updated last year
Alternatives and similar repositories for sope
Users that are interested in sope are comparing it to the libraries listed below
Sorting:
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- Connect your Spark Databricks clusters Log4J output to the Application Insights Appender☆19Aug 4, 2020Updated 5 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Feb 26, 2026Updated last week
- Quick Akka Micro Dag Prototype☆13Apr 8, 2016Updated 9 years ago
- Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API☆14Apr 15, 2020Updated 5 years ago
- low-level helpers for Apache Spark libraries and tests☆16Dec 29, 2018Updated 7 years ago
- ☆20Dec 30, 2022Updated 3 years ago
- A Spark datasource for the HadoopOffice library☆36Sep 29, 2025Updated 5 months ago
- ☆45Apr 27, 2020Updated 5 years ago
- 🚀 Validation DSL for data pipelines☆24Jun 12, 2018Updated 7 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- Utilities for writing tests that use Apache Spark.☆24Dec 29, 2018Updated 7 years ago
- Repository for my Analysis Service Azure pipelines tasks related to Azure Analysis Service or Power BI Premium☆27Mar 20, 2024Updated last year
- Import data from CSV files to Cassandra using Akka Streams with Java 8☆22May 19, 2017Updated 8 years ago
- Contains sample code for a lightning talk on HBase.☆39Oct 13, 2020Updated 5 years ago
- Generate schema sources for Scala, Java and Elm from an openapi 3.0 spec.☆26Feb 2, 2026Updated last month
- Build configuration-driven ETL pipelines on Apache Spark☆161Oct 4, 2022Updated 3 years ago
- Apache Presto GUI, provider quick query and also scheduled tasks☆23Apr 27, 2018Updated 7 years ago
- Components for building stream loaders from Kafka to arbitrary storages☆38Jan 23, 2026Updated last month
- Scala and SQL happy together.☆29Dec 13, 2016Updated 9 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆122Updated this week
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Test your Hive scripts inside your favorite IDE with HiveQLUnit! Increase your developers productivity by testing on all operating system…☆40Oct 13, 2020Updated 5 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines, including BigQ…☆41Oct 28, 2025Updated 4 months ago
- A giter8 template for Spark SBT projects☆72Mar 20, 2021Updated 4 years ago
- This extension adds Azure Data Factory release tasks to Azure Pipelines.☆28Jun 9, 2022Updated 3 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- NSI power site project☆17May 5, 2012Updated 13 years ago
- ☆11Mar 27, 2024Updated last year
- ☆14Nov 10, 2025Updated 3 months ago
- Learning PySpark video series☆11Mar 5, 2018Updated 8 years ago
- Airbyte is the go-sdk/cdk to help build connectors quickly in go. This package abstracts away much of the "protocol" away from the user a…☆41Feb 22, 2024Updated 2 years ago
- Typed Spreadsheet UI library for ScalaJS☆40Dec 7, 2022Updated 3 years ago
- Schema Registry integration for Apache Spark☆40Nov 16, 2022Updated 3 years ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆104Jan 22, 2024Updated 2 years ago
- A timer module for Redis☆11Oct 16, 2019Updated 6 years ago
- Repo to hold code Artifacts for WAF☆10Sep 14, 2022Updated 3 years ago