GoogleCloudDataproc / jupyterhub-dataprocspawner
☆14Updated 2 years ago
Alternatives and similar repositories for jupyterhub-dataprocspawner:
Users that are interested in jupyterhub-dataprocspawner are comparing it to the libraries listed below
- Hive Storage Handler for interoperability between BigQuery and Apache Hive☆19Updated last month
- Hadoop Data Pipeline using Falcon☆15Updated 8 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated last month
- Navigator SDK☆22Updated 6 years ago
- [ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples☆70Updated 4 years ago
- Profiles the data, validates the schema and runs data quality checks and produces a report☆20Updated 5 years ago
- ☆54Updated 7 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Single view demo☆14Updated 9 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 7 years ago
- Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.☆70Updated last year
- An example PySpark project with pytest☆17Updated 7 years ago
- Cask Hydrator Plugins Repository☆68Updated this week
- HDF masterclass materials☆28Updated 8 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Demos around Ambari Views, Services, Blueprints☆63Updated 9 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- ☕⛵WIP PySpark dependency management☆22Updated 6 years ago
- A Spark datasource for the HadoopOffice library☆38Updated 2 years ago
- An Integrated and collaborative cloud environment for building and running Spark applications on PKS/Kubernetes☆82Updated 5 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- This is the support code and solutions for the NYC Taxi Tycoon Dataflow Codelab☆60Updated 5 years ago
- This repository is to help with the Partner Demonstration of the Apache Atlas project.☆30Updated 9 years ago
- Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR☆21Updated 5 years ago
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Star Schema Benchmark using the Hive / Druid Integration☆30Updated 7 years ago
- Magic to help Spark pipelines upgrade☆34Updated 5 months ago
- Apache Atlas development image for the Rokku project: https://github.com/ing-bank/rokku☆21Updated 4 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago