kwanUm / awesome-data-quality
Curated list of tools and frameworks assisting in monitoring data quality
☆12Updated 2 years ago
Alternatives and similar repositories for awesome-data-quality:
Users that are interested in awesome-data-quality are comparing it to the libraries listed below
- Metafeature Extraction for Unstructured Data☆101Updated 2 weeks ago
- Data Tools Subjective List☆83Updated last year
- FlockMTL: DuckDB extension to seamlessly combine analytics and semantic analysis using language models (LMs)☆108Updated 2 months ago
- Read infrastructure data from your cloud ☁️ and export it to a SQL database 📋.☆33Updated last year
- Analyzing hacker news in real-time with Bytewax and Proton☆39Updated last year
- List of entity resolution software and resources.☆63Updated last month
- A tool to provision MLOps environments in Azure☆31Updated last year
- Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️☆16Updated 2 weeks ago
- A pytest plugin for running and analyzing LLM evaluation tests.☆116Updated last month
- ☆93Updated last year
- Python Library for FeatureOps☆65Updated this week
- Entity resolution for everyone. Minimal. No dependencies.☆10Updated 7 months ago
- Progzee is a Python library for simplifying IP proxy usage in HTTP requests.☆16Updated last month
- The bridge to effortless multi-engine data applications, currently supports Snowflake ❄️ and DuckDB 🦆☆173Updated this week
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆95Updated this week
- dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy☆24Updated 11 months ago
- Transform your pythonic research to an artifact that engineers can deploy easily.☆151Updated last week
- Boiling Insights - From raw S3 data to charts in seconds☆17Updated 3 months ago
- Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications☆91Updated 5 months ago
- Anomstack - Painless open source anomaly detection for your metrics 📈📉🚀☆97Updated this week
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 6 months ago
- Contribute to dlt verified sources 🔥☆82Updated 2 weeks ago
- DuckDB Community Extension to prompt LLMs from SQL☆44Updated 2 months ago
- A curated list of awesome DataOps tools☆177Updated 5 months ago
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆222Updated last month
- An SDK for working with LLMs and AI Agents from Apache Airflow, based on Pydantic AI☆25Updated this week
- A simple DAG for executing LLM calls and using tools.☆41Updated last year
- A playground for running duckdb as a stateless query engine over a data lake☆192Updated last year
- Python package for querying iceberg data through duckdb.☆64Updated last year
- Plugins, extensions, case studies, articles, and video tutorials for Kedro☆73Updated 3 months ago