spotify/spark-bigquery

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/spotify/spark-bigquery)

spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames

☆156

Alternatives and similar repositories for spark-bigquery

Users that are interested in spark-bigquery are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

samelamin / spark-bigquery
View on GitHub
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
☆70May 8, 2023Updated 3 years ago
yu-iskw / spark-streaming-with-google-cloud-example
View on GitHub
an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore
☆16Mar 22, 2017Updated 9 years ago
seratch / bigquery4s
View on GitHub
A handy Scala wrapper of Google BigQuery API 's Java Client Library.
☆34Sep 29, 2018Updated 7 years ago
GoogleCloudDataproc / hive-bigquery-storage-handler
View on GitHub
Hive Storage Handler for interoperability between BigQuery and Apache Hive
☆19Jan 29, 2025Updated last year
GoogleCloudDataproc / hadoop-connectors
View on GitHub
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
☆292Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
nevillelyh / parquet-extra
View on GitHub
A collection of Apache Parquet add-on modules
☆31Updated this week
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
spotify / scio
View on GitHub
A Scala API for Apache Beam and Google Cloud Dataflow.
☆2,625Updated this week
miraisolutions / spark-bigquery
View on GitHub
Google BigQuery data source for Apache Spark
☆17Oct 1, 2024Updated last year
spotify / spydra
View on GitHub
Ephemeral Hadoop clusters using Google Compute Platform
☆136Mar 31, 2022Updated 4 years ago
prestodb / presto-hadoop-apache
View on GitHub
Shaded version of Apache Hadoop 2.x for Presto
☆16Sep 16, 2025Updated 10 months ago
GoogleCloudDataproc / spark-bigquery-connector
View on GitHub
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
☆425Jul 17, 2026Updated last week
spotify / ratatool
View on GitHub
A tool for data sampling, data generation, and data diffing
☆349Updated this week
spotify / hype
View on GitHub
Runs JVM closures in Docker containers on Kubernetes
☆38Mar 23, 2018Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
googledatalab / datalab
View on GitHub
Interactive tools and developer experiences for Big Data on Google Cloud Platform.
☆978Sep 2, 2022Updated 3 years ago
alexvanboxel / airflow-gcp-k8s
View on GitHub
☆54Aug 3, 2017Updated 8 years ago
alexvanboxel / airflow-gcp-examples
View on GitHub
Repository with examples and smoke tests for the GCP Airflow operators and hooks
☆152Jan 15, 2017Updated 9 years ago
GoogleCloudDataproc / initialization-actions
View on GitHub
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
☆597Updated this week
darkjh / scalaflow
View on GitHub
Fluent Scala DSL for Google's Cloud Dataflow SDK
☆56Aug 2, 2015Updated 10 years ago
wepay / kafka-connect-bigquery
View on GitHub
DEPRECATED. PLEASE USE https://github.com/confluentinc/kafka-connect-bigquery. A Kafka Connect BigQuery sink connector
☆151Mar 4, 2024Updated 2 years ago
GoogleCloudDataproc / jupyterhub-dataprocspawner
View on GitHub
☆14May 27, 2022Updated 4 years ago
iconara / bigshift
View on GitHub
A tool for moving tables from Redshift to BigQuery
☆65Jan 20, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
memsql / singlestore-spark-connector
View on GitHub
A connector for SingleStore and Spark
☆164Jul 1, 2026Updated 3 weeks ago
GoogleCloudPlatform / dataflow-prediction-example
View on GitHub
☆84Jan 26, 2026Updated 5 months ago
GoogleCloudPlatform / spark-examples
View on GitHub
Spark pipelines that correspond to a series of Dataflow examples.
☆27May 5, 2019Updated 7 years ago
amplab / keystone
View on GitHub
Simplifying robust end-to-end machine learning on Apache Spark.
☆473Apr 18, 2017Updated 9 years ago
julianpeeters / avro-scala-macro-annotations
View on GitHub
Compile-time tools for working with Avros in Scala
☆55Dec 10, 2017Updated 8 years ago
GoogleCloudPlatform / pontem
View on GitHub
Open source tools for Google Cloud Storage and Databases.
☆65May 1, 2024Updated 2 years ago
spotify / featran
View on GitHub
A Scala feature transformation library for data science and machine learning
☆475Feb 7, 2025Updated last year
collectivemedia / spark-hyperloglog
View on GitHub
Interactive Audience Analytics with Spark and HyperLogLog
☆55Oct 14, 2015Updated 10 years ago
databricks / tensorframes
View on GitHub
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
☆744Jul 30, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
spotify / async-google-pubsub-client
View on GitHub
[SUNSET] Async Google Pubsub Client
☆158Mar 18, 2023Updated 3 years ago
spotify / gcs-tools
View on GitHub
GCS support for avro-tools, parquet-tools and protobuf
☆80Jul 14, 2026Updated last week
GoogleCloudPlatform / dataflow-opinion-analysis
View on GitHub
Opinion Analysis of News, Threaded Conversations, and User Generated Content
☆110Sep 19, 2024Updated last year
mitodl / edx2bigquery
View on GitHub
Tool to convert & load data from edX platform into BigQuery
☆29Dec 1, 2023Updated 2 years ago
bokeh / bokeh-scala
View on GitHub
Scala bindings for Bokeh plotting library
☆138Oct 11, 2023Updated 2 years ago
twitter-archive / jaqen
View on GitHub
A type-safe heterogenous Map or a Named field Tuple
☆35Nov 8, 2014Updated 11 years ago
hakobera / luigi-bigquery
View on GitHub
Luigi integration for Google BigQuery
☆15Nov 18, 2015Updated 10 years ago