kwanUm / awesome-data-quality
Curated list of tools and frameworks assisting in monitoring data quality
☆12Updated 3 years ago
Alternatives and similar repositories for awesome-data-quality:
Users that are interested in awesome-data-quality are comparing it to the libraries listed below
- A Kubernetes operator for managing Prefect servers and work pools☆13Updated this week
- Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications☆95Updated 7 months ago
- Entity resolution for everyone. Minimal. No dependencies.☆10Updated 3 weeks ago
- HyPSTER - HyperParameter optimization on STERoids☆48Updated 5 months ago
- Read infrastructure data from your cloud ☁️ and export it to a SQL database 📋.☆33Updated last year
- Real-time deduplication and temporal joins for streaming data☆27Updated this week
- dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy☆24Updated last year
- Next generation compute platform for the post-modern data stack☆15Updated this week
- Metafeature Extraction for Unstructured Data☆101Updated last month
- High-scale LLM gateway, written in Rust. OpenTelemetry-based observability included☆63Updated 2 weeks ago
- Sord Data Fabric: A Vue 3 frontend with a Python WebSocket server, leveraging a distributed architecture with DeltaLake and DuckDB worker…☆18Updated last year
- A tool to provision MLOps environments in Azure☆31Updated last year
- Tutorials, templates for running glassflow pipelines☆30Updated 2 months ago
- A library to find and visualise the most interesting slices in multidimensional data☆108Updated last month
- IbisML is a library for building scalable ML pipelines using Ibis.☆108Updated 4 months ago
- Progzee is a Python library for simplifying IP proxy usage in HTTP requests.☆16Updated 2 months ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 8 months ago
- Contribute to dlt verified sources 🔥☆84Updated this week
- Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).☆17Updated 11 months ago
- List of entity resolution software and resources.☆64Updated 2 months ago
- ☆34Updated 3 weeks ago
- 🛡️ Managed isolated environments for Python☆94Updated 3 weeks ago
- DuckDB Community Extension to prompt LLMs from SQL☆46Updated 4 months ago
- Python wrapper for the Sling CLI tool☆50Updated 3 weeks ago
- Airbyte made simple (no UI, no database, no cluster)☆171Updated 3 weeks ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated last year
- A Python framework for defining and querying BI models in your data warehouse☆166Updated 3 months ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆145Updated last month
- The open source metrics layer☆38Updated last week
- A playground for running duckdb as a stateless query engine over a data lake☆199Updated last year