fredrikhgrelland / data-meshLinks
A cloud native data mesh implementation
☆12Updated 4 years ago
Alternatives and similar repositories for data-mesh
Users that are interested in data-mesh are comparing it to the libraries listed below
Sorting:
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆17Updated 4 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.☆20Updated last year
- An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks☆52Updated 2 weeks ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated this week
- A series of workshop modules introducing Feast feature store.☆19Updated 3 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 6 months ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- ☆106Updated 2 years ago
- real-time data + ML pipeline☆54Updated this week
- Data Catalog for Databases and Data Warehouses☆35Updated last year
- Apache DataLab (incubating)☆153Updated last year
- A parser for SQL, which gives back identifiers and a hierarchical model for lineage tracking☆20Updated 7 years ago
- Deploy dask on YARN clusters☆69Updated 10 months ago
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 5 years ago
- Python binding for DataFusion☆59Updated 2 years ago
- Dockerized setup for testing code on realistic hadoop clusters☆27Updated 4 years ago
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful …☆143Updated 11 months ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino☆88Updated 3 weeks ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆29Updated 2 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆37Updated 4 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 9 months ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- Fake Pandas / PySpark DataFrame creator☆47Updated last year
- Helpers & syntactic sugar for PySpark.☆62Updated 2 years ago
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.☆126Updated 3 years ago