Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
☆70May 8, 2023Updated 2 years ago
Alternatives and similar repositories for spark-bigquery
Users that are interested in spark-bigquery are comparing it to the libraries listed below
Sorting:
- Google BigQuery support for Spark, SQL, and DataFrames☆155Dec 14, 2019Updated 6 years ago
- ☆31Oct 17, 2018Updated 7 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Oct 12, 2016Updated 9 years ago
- A handy Scala wrapper of Google BigQuery API 's Java Client Library.☆34Sep 29, 2018Updated 7 years ago
- Using the Parquet file format (with Avro) to process data with Apache Flink☆14Aug 17, 2015Updated 10 years ago
- Hadoop InputFormat for http://druid.io/☆10Oct 26, 2016Updated 9 years ago
- Google BigQuery API using service account credentials.☆21Feb 22, 2016Updated 10 years ago
- An application that uses Cloud Dataflow and Cloud Build to copy/transfer BigQuery tables between locations/regions.☆14Mar 17, 2021Updated 4 years ago
- Minitime - a Java Time wrapper for Scala and Scala.js☆16Jan 17, 2020Updated 6 years ago
- ☆22Jun 9, 2016Updated 9 years ago
- Postgres extension drivers for quill☆15Oct 31, 2016Updated 9 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆421Updated this week
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- Recipes and examples for Apache Spark☆13Jan 21, 2015Updated 11 years ago
- Apache Zeppelin Service for Apache Ambari Service. Installation and management of Zeppelin via Ambari.☆14Jan 23, 2016Updated 10 years ago
- Code to munge data between Kaggle .tsv Rotten Tomatoes Sentiment Analysis data set and Vowpal Wabbit☆24Jun 22, 2014Updated 11 years ago
- Dockerfile for Apache Zeppelin☆17Dec 9, 2015Updated 10 years ago
- Writing PySpark logs in Apache Spark and Databricks☆17Jun 13, 2022Updated 3 years ago
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- Hive Storage Handler for interoperability between BigQuery and Apache Hive☆19Jan 29, 2025Updated last year
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Feb 13, 2018Updated 8 years ago
- Coursera Machine Learning class examples in Spark☆43Feb 14, 2014Updated 12 years ago
- BigQuery Schema Conversion Tool☆23Oct 6, 2020Updated 5 years ago
- Sparklyr extension package to connect to Google BigQuery☆19Oct 29, 2024Updated last year
- ☆21Mar 17, 2023Updated 2 years ago
- Library for organizing batch processing pipelines in Apache Spark☆42Jan 4, 2017Updated 9 years ago
- ScalikeJDBC extension for Google BigQuery☆18Mar 15, 2020Updated 5 years ago
- Ansible playbook for automated HDP 2.x deployment install with Kerberos☆19Sep 8, 2016Updated 9 years ago
- Apache Calcite Adapter for Apache Kudu☆28Sep 26, 2025Updated 5 months ago
- Spark DataFrames for earth observation data☆19May 1, 2018Updated 7 years ago
- Spark data profiling utilities☆23Nov 24, 2018Updated 7 years ago
- A sink to save Spark Structured Streaming DataFrame into Hive table☆23May 7, 2018Updated 7 years ago
- A minimal seed template for an Akka gRPC with Scala build☆19Jan 22, 2026Updated last month
- Discover Flink clusters on Hadoop YARN for Prometheus☆23Aug 5, 2020Updated 5 years ago
- Small Docker image with Scala based on OracleJDK 8 (191MB)☆21Feb 2, 2019Updated 7 years ago
- Flink Controller implements a Kubernetes Custom Controller (aka Kubernetes Operator) for Apache Flink☆52Jan 26, 2026Updated last month
- Example code for building your own MemSQL Streamliner Pipelines☆23Apr 18, 2017Updated 8 years ago
- A SBT resolver and publisher for Google Cloud Storage☆23Dec 15, 2021Updated 4 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆96Jul 7, 2021Updated 4 years ago