open-datastudio / datastudio
Data science, machine learning tools on the cloud
☆15Updated 4 years ago
Alternatives and similar repositories for datastudio:
Users that are interested in datastudio are comparing it to the libraries listed below
- A Spark datasource for the HadoopOffice library☆39Updated 2 years ago
- ☆12Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated last year
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- ☆48Updated 4 years ago
- Python - Java/Scala API for the Hopsworks feature store☆54Updated this week
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆95Updated 2 weeks ago
- Apache DataLab (incubating)☆153Updated last year
- A tool to install, configure and manage Trino installations☆27Updated 2 years ago
- ☆39Updated 5 years ago
- ☆37Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆114Updated this week
- A plugin for Airflow that create and manage your DAG with web UI.☆20Updated 7 years ago
- This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python. PyJava introduces Apache A…☆46Updated last year
- DataQuality for BigData☆143Updated last year
- PostgreSQL and GreenPlum Data Source for Apache Spark☆35Updated 11 months ago
- A bridge to Apache Atlas for provenance metadata created in course of using Apache NiFi☆15Updated 2 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- ☆19Updated 3 years ago
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Updated 5 years ago
- An Integrated and collaborative cloud environment for building and running Spark applications on PKS/Kubernetes☆81Updated 4 years ago
- Multiple node presto cluster on docker container☆124Updated 2 years ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- Demo application for GRADOOP operators☆23Updated 4 years ago
- Spline agent for Apache Spark☆190Updated 3 weeks ago
- Ranger Hive Metastore Plugin☆18Updated last year
- Pipeline library for StreamSets Data Collector and Transformer☆32Updated 2 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- Cask Hydrator Plugins Repository☆67Updated this week