mfcabrera / hooqu
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
☆29Updated 4 months ago
Alternatives and similar repositories for hooqu:
Users that are interested in hooqu are comparing it to the libraries listed below
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 8 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- A python library bakeoff for medium sized datasets☆24Updated last year
- Dask integration for Snowflake☆30Updated 5 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- ☆32Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆108Updated 4 months ago
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- ☆30Updated 3 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- Assessing whether data from database complies with reference information.☆42Updated last week
- Batteries included toolkit for data engineering.☆34Updated 4 months ago
- A software engineering framework to jump start your machine learning projects☆37Updated 10 months ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated last year
- Kedro Plugin to support running pipelines on Kubernetes using Airflow.☆28Updated last month
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.☆45Updated last month
- Self-contained demo using Kafka, Materialize and Metabase to check what's streaming on Twitch. All you need is Docker and Twitch access t…☆24Updated 3 years ago
- real-time data + ML pipeline☆54Updated 3 weeks ago
- Efficient BM25 with DuckDB 🦆☆48Updated 4 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆77Updated last year
- Linear regression in SQL using dbt☆70Updated 3 months ago
- 🚕 Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations☆46Updated 2 years ago
- Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documen…☆18Updated last year
- A PaaS End-to-End ML Setup with Metaflow, Serverless and SageMaker.☆37Updated 4 years ago
- An abstraction layer for parameter tuning☆35Updated 8 months ago