tjhunter / dds_py
Data-driven software (python implementation)
☆25Updated last year
Alternatives and similar repositories for dds_py:
Users that are interested in dds_py are comparing it to the libraries listed below
- Data Sketches for Apache Spark☆22Updated 2 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28Updated 4 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- real-time data + ML pipeline☆54Updated 3 weeks ago
- A Spark datasource for the HadoopOffice library☆38Updated 2 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- Unity Catalog UI☆39Updated 5 months ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 5 months ago
- MLflow App Library☆77Updated 6 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- This repository is no longer maintained.☆15Updated 2 years ago
- Data Catalog for Databases and Data Warehouses☆32Updated last year
- Observability Python library - Powered by Kensu☆22Updated 4 months ago
- ☆10Updated 2 years ago
- Real-time anomaly detection using Kafka, KSQL User Defined Function and a pre-trained model☆30Updated last year
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- ☕⛵WIP PySpark dependency management☆22Updated 6 years ago
- ☆54Updated last year
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated last week
- ☆30Updated 3 years ago
- ☆19Updated last year
- Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…☆13Updated 8 years ago
- ☆37Updated 5 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year