fredrikhgrelland / data-mesh
A cloud native data mesh implementation
☆12Updated 4 years ago
Alternatives and similar repositories for data-mesh:
Users that are interested in data-mesh are comparing it to the libraries listed below
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Dask integration for Snowflake☆30Updated 5 months ago
- Data Catalog for Databases and Data Warehouses☆34Updated last year
- A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino☆87Updated 3 weeks ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 4 years ago
- A Python package that parses sql and converts it to ibis expressions☆54Updated last year
- ☆68Updated 3 months ago
- Dockerized setup for testing code on realistic hadoop clusters☆27Updated 4 years ago
- Deploy dask on YARN clusters☆69Updated 8 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 4 years ago
- A collection of python utility functions☆11Updated 9 months ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 7 months ago
- This repository is no longer maintained.☆15Updated 3 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆17Updated 3 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks☆50Updated last month
- Spawn JupyterHub single user notebook servers in Hadoop/YARN containers.☆19Updated this week
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated 4 months ago
- real-time data + ML pipeline☆54Updated 2 weeks ago
- Data Lineage Tracing Library☆22Updated 3 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- Ibis analytics, with Ibis (and more!)☆21Updated 7 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- CLI for data platform☆19Updated last year
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Function dependencies resolution and execution☆70Updated 4 years ago