vertica / spark-connector
This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
☆20Updated 2 weeks ago
Related projects: ⓘ
- A library that provides useful extensions to Apache Spark and PySpark.☆193Updated last week
- ☆13Updated last month
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- A library that brings useful functions from various modern database management systems to Apache Spark☆53Updated last year
- Snowflake Data Source for Apache Spark.☆213Updated this week
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated 10 months ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆82Updated 5 months ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- Extensible Rules Engine for custom Dataframe / Dataset validation☆134Updated 4 months ago
- Spline agent for Apache Spark☆183Updated last week
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- DataQuality for BigData☆139Updated 9 months ago
- A simple Spark-powered ETL framework that just works 🍺☆177Updated 9 months ago
- ☆63Updated 4 years ago
- ☆77Updated last year
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last year
- A Python client for Apache Livy, enabling use of remote Apache Spark clusters.☆70Updated 2 years ago
- Adapter for dbt that executes dbt pipelines on Apache Flink☆80Updated 6 months ago
- Examples of Spark 3.0☆46Updated 3 years ago
- Magic to help Spark pipelines upgrade☆33Updated last month
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆51Updated this week
- Spark connector for SFTP☆100Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆183Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 6 months ago
- Visualize column-level data lineage in Spark SQL☆85Updated 2 years ago
- Storage connector for Trino☆90Updated 3 weeks ago
- The Internals of Delta Lake☆180Updated last month
- Extensible streaming ingestion pipeline on top of Apache Spark☆43Updated 6 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- A Spark datasource for the HadoopOffice library☆39Updated last year