GoogleCloudDataproc / jupyterhub-dataprocspawner
☆14Updated 2 years ago
Alternatives and similar repositories for jupyterhub-dataprocspawner:
Users that are interested in jupyterhub-dataprocspawner are comparing it to the libraries listed below
- Hive Storage Handler for interoperability between BigQuery and Apache Hive☆19Updated 2 months ago
- Cloud Spanner Connector for Apache Spark☆17Updated 3 months ago
- ☆37Updated 5 years ago
- Oozie Workflow to Airflow DAGs migration tool☆87Updated last month
- ☆54Updated 7 years ago
- Set of iPython and Jupyter extensions to improve user experience☆50Updated 5 years ago
- Tools for creating Dataproc custom images☆32Updated last week
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.☆70Updated last year
- Sample code with integration between Data Catalog and Hive data source.☆25Updated 2 months ago
- An example PySpark project with pytest☆16Updated 7 years ago
- Utilities to work with Scala/Java code with py4j☆40Updated last year
- ☕⛵WIP PySpark dependency management☆22Updated 6 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 7 years ago
- This is the support code and solutions for the NYC Taxi Tycoon Dataflow Codelab☆61Updated 5 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Snippets of code used in blog posts and other media.☆13Updated last week
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- Apache Spark AWS Lambda Executor (SAMBA)☆44Updated 6 years ago
- UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆12Updated 5 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Ephemeral Hadoop clusters using Google Compute Platform☆135Updated 3 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 4 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- [ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples☆70Updated 4 years ago
- KSQL Syntax Highlighting for VSCode☆17Updated 2 years ago
- Tutorial for Deploying Anaconda Cluster and PySpark on top of Red Hat Storage GlusterFS☆8Updated 10 years ago
- Jupyter extensions for SWAN☆58Updated last week
- Navigator SDK☆22Updated 6 years ago