GoogleCloudDataproc / spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
☆369Updated last week
Related projects: ⓘ
- Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.☆142Updated 3 months ago
- Data Quality Engine for BigQuery☆255Updated 2 months ago
- Dataproc templates and pipelines for solving simple in-cloud data tasks☆116Updated this week
- Snowflake Data Source for Apache Spark.☆213Updated this week
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆390Updated this week
- Cloud Dataproc: Samples and Utils☆198Updated last month
- Spark style guide☆255Updated last year
- ☆195Updated 11 months ago
- Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.☆279Updated last week
- (Legacy) Command Line Interface for Databricks☆383Updated 11 months ago
- ☆375Updated this week
- Apache Airflow integration for dbt☆392Updated 4 months ago
- Data ingestion library for Amundsen to build graph and search index☆206Updated 6 months ago
- Essential Spark extensions and helper methods ✨😲☆747Updated 2 years ago
- A dbt adapter for Databricks.☆211Updated this week
- Airflow Unit Tests and Integration Tests☆254Updated last year
- PySpark test helper methods with beautiful error messages☆583Updated last week
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆429Updated this week
- Performant Redshift data source for Apache Spark☆135Updated last month
- Run in all nodes of your cluster before the cluster starts - lets you customize your cluster☆587Updated last week
- A simplified, lightweight ETL Framework based on Apache Spark☆581Updated 7 months ago
- ☆74Updated 2 months ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆692Updated last month
- Python API for Deequ☆704Updated last week
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆341Updated 3 months ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆624Updated last week
- A simple Spark-powered ETL framework that just works 🍺☆177Updated 9 months ago
- CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer☆348Updated this week
- Avro SerDe for Apache Spark structured APIs.☆228Updated last month
- Great Expectations Airflow operator☆158Updated 2 weeks ago