Possibly the fastest DataFrame-agnostic quality check library in town.
☆240Feb 5, 2026Updated last month
Alternatives and similar repositories for cuallee
Users that are interested in cuallee are comparing it to the libraries listed below
Sorting:
- ☆15Dec 11, 2023Updated 2 years ago
- Turning PySpark Into a Universal DataFrame API☆493Updated this week
- The smallest DuckDB SQL orchestrator on Earth.☆337Nov 22, 2025Updated 3 months ago
- Cost Efficient Data Pipelines with DuckDB☆63May 14, 2025Updated 9 months ago
- Lightweight and extensible compatibility layer between dataframe libraries!☆1,542Updated this week
- Feature engineering library that helps you keep track of feature dependencies, documentation and schema☆28Jan 21, 2022Updated 4 years ago
- ☆16Apr 26, 2024Updated last year
- A command line app that makes Git easy.☆11Oct 25, 2021Updated 4 years ago
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 9 months ago
- A high-performance data streaming system using DuckDB and Apache Arrow Flight.☆96Feb 22, 2025Updated last year
- ☆30Dec 4, 2024Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆683Mar 6, 2025Updated last year
- Delta Lake helper methods in PySpark☆327Jan 19, 2026Updated last month
- An implementation of Measures in SQL as a DuckDB extension☆43Feb 25, 2026Updated last week
- Scalable and efficient data transformation framework - backwards compatible with dbt.☆2,928Updated this week
- A repository of blogs/videos that presents how Apache Iceberg is being used in Production by various orgs☆18Jul 31, 2023Updated 2 years ago
- Sentiment and language detection for text analytics.☆17Jul 3, 2024Updated last year
- A data modelling layer built on top of polars and pydantic☆608Feb 4, 2026Updated last month
- Primary repository for NYC DCP's Data Engineering team☆34Updated this week
- 🏁 A sweet and speedy code generator for dbt 🏎️✨☆32Jan 23, 2026Updated last month
- Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and…☆2,413Mar 1, 2026Updated last week
- Python API for Deequ☆813Updated this week
- A Python Library to support running data quality rules while the spark job is running⚡☆200Updated this week
- ☆22Nov 30, 2022Updated 3 years ago
- ☆26Nov 14, 2024Updated last year
- The best Python package for comparing two dataframes☆11Dec 29, 2021Updated 4 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆282Updated this week
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated last year
- a convenient way to anonymize your data for analytics☆22Nov 7, 2021Updated 4 years ago
- PySpark test helper methods with beautiful error messages☆753Feb 25, 2026Updated last week
- Minimal plugin loading package for polars with optional type stub generation☆20Jan 29, 2026Updated last month
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆13May 24, 2024Updated last year
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,140Feb 21, 2026Updated 2 weeks ago
- [Project moved] Polars integration for Dagster☆35Apr 17, 2025Updated 10 months ago
- A native Rust library for Delta Lake, with bindings into Python☆3,160Mar 2, 2026Updated last week
- Beautifully colored, quick and simple Python logging☆42May 22, 2021Updated 4 years ago
- Deploy multiple Dagster data pipelines on Docker environment☆23Apr 23, 2024Updated last year
- Local development environment for python data projects, with Docker☆23Dec 14, 2022Updated 3 years ago
- ☆31Dec 15, 2023Updated 2 years ago