tjhunter / dds_py
Data-driven software (python implementation)
☆25Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dds_py
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- Code that was used as an example during the Data+AI Summit 2020☆15Updated 3 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆56Updated last year
- Parquet Command-line Tools☆18Updated 8 years ago
- The sane way of building a data layer in Airflow☆24Updated 4 years ago
- Real-time anomaly detection using Kafka, KSQL User Defined Function and a pre-trained model☆30Updated 10 months ago
- A Spark datasource for the HadoopOffice library☆39Updated 2 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆11Updated 4 years ago
- ☕⛵WIP PySpark dependency management☆22Updated 6 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆25Updated this week
- ☆10Updated 2 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Updated 2 months ago
- phData Pulse application log aggregation and monitoring☆13Updated 4 years ago
- Testing Scala code with scalatest☆11Updated 2 years ago
- Sketching data structures for scala, including t-digest☆15Updated 3 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28Updated 4 years ago
- Utilities for writing tests that use Apache Spark.☆24Updated 5 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Connect DBVisualizer to Hortonwork HiveServer2☆9Updated 9 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆16Updated 3 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆42Updated 9 months ago
- ☆13Updated last year
- Observability Python library - Powered by Kensu☆22Updated 3 weeks ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- real-time data + ML pipeline☆54Updated 2 weeks ago