jerzygangi / forklift
π ETL for Spark and Airflow
β24Updated 6 years ago
Related projects: β
- Apache Spark ETL Utilitiesβ40Updated last year
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operatorβ75Updated 5 years ago
- How to manage Slowly Changing Dimensions with Apache Hiveβ55Updated 5 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Sparkβ41Updated 7 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines anβ¦β60Updated 2 weeks ago
- β47Updated 4 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspectionβ18Updated 7 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.β72Updated 3 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.β28Updated 6 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelinβ52Updated 8 years ago
- Example project showing how to use Hive UDFs in Apache Sparkβ55Updated 5 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multipleβ¦β26Updated 3 years ago
- β71Updated 3 years ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Stormβ105Updated 7 months ago
- Examples for High Performance Sparkβ15Updated 3 weeks ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocolβ34Updated 2 years ago
- A Spark datasource for the HadoopOffice libraryβ39Updated last year
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.β48Updated 8 years ago
- Spark structured streaming with Kafka data source and writing to Cassandraβ64Updated 4 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations oβ¦β49Updated 8 months ago
- type-class based data cleansing library for Apache Spark SQLβ79Updated 5 years ago
- Schema Registry integration for Apache Sparkβ39Updated last year
- β49Updated this week
- Example project to show how to use Spark to read and write Avro/Parquet filesβ50Updated 11 years ago
- Support Highcharts in Apache Zeppelinβ81Updated 6 years ago
- Utilities for writing tests that use Apache Spark.β24Updated 5 years ago
- High performance HBase / Spark SQL engineβ28Updated 2 years ago
- Spark to Tableau Extractor libraryβ18Updated 6 years ago
- Airflow workflow management platform chef cookbook.β67Updated 5 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)β74Updated 10 months ago