Data validation library for PySpark 3.0.0
☆33Nov 11, 2022Updated 3 years ago
Alternatives and similar repositories for check-engine
Users that are interested in check-engine are comparing it to the libraries listed below
Sorting:
- ☆10Feb 18, 2021Updated 5 years ago
- ☆15May 31, 2023Updated 2 years ago
- Asynchronous actions for PySpark☆48Dec 2, 2021Updated 4 years ago
- Terraform plans & commands to provision Azure VMSS and VM from a VM image on demand or from a Jenkins pipeline.☆27Aug 9, 2018Updated 7 years ago
- A toolset to streamline running spark python on EMR☆20Nov 16, 2016Updated 9 years ago
- spark on kubernetes☆104Feb 20, 2023Updated 3 years ago
- Open source stack lakehouse☆25Mar 2, 2024Updated 2 years ago
- ☆26Jul 9, 2023Updated 2 years ago
- ☆10Jun 29, 2021Updated 4 years ago
- Deeper look into access control and data lake configuration☆34Apr 13, 2022Updated 3 years ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆15Jan 23, 2025Updated last year
- Dashboard in Python with automatic updates and email notifications☆10Jun 9, 2022Updated 3 years ago
- Data Catalog for Databases and Data Warehouses☆36Jan 15, 2024Updated 2 years ago
- Example MLOps using BentoML & mlFlow☆38May 9, 2021Updated 4 years ago
- Deep Learning for Computer Vision Practitioner Bundle examples and excercises☆11Apr 23, 2019Updated 6 years ago
- Eppo Python SDK☆12Nov 8, 2024Updated last year
- Manage Unity Catalog tables with Pydantic Models☆10Mar 5, 2025Updated last year
- Scala library for parsing fixed length file format☆13Oct 19, 2021Updated 4 years ago
- Playground site for creating/validating data contracts☆11Aug 9, 2025Updated 7 months ago
- A low-level, cross-platform port scanner and packet flooder written in Rust.☆13Mar 25, 2025Updated 11 months ago
- prebuilt configurations for docker-rpm-builder☆11Feb 5, 2021Updated 5 years ago
- Github action for running python unit tests☆10Jun 16, 2025Updated 8 months ago
- Architecture principles☆13May 23, 2025Updated 9 months ago
- ☆10Jan 28, 2025Updated last year
- R dashboard as a designer☆10Oct 29, 2015Updated 10 years ago
- The Data Product Specification☆11Jan 28, 2025Updated last year
- M. Gągolewski, M. Bartoszuk, A. Cena, Przetwarzanie i analiza danych w języku Python, PWN, 2016☆10Jan 16, 2023Updated 3 years ago
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- Datamallet is a python library which contains several helper functions and module for the common tasks in a typical data science workflow…☆11May 19, 2022Updated 3 years ago
- Tracebacks for Humans (in Jupyter notebooks)☆12Dec 30, 2025Updated 2 months ago
- This project provides a sample code to implement API GW GraphQL API's (sample code uses python grpahql library Graphene) in Lambda functi…☆10May 3, 2021Updated 4 years ago
- ☆95Aug 15, 2022Updated 3 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- How to distribute a Python package with data: https://stackoverflow.com/questions/3596979/manifest-in-ignored-on-python-setup-py-install-…☆14Dec 28, 2020Updated 5 years ago
- Data Engineer Roadmaps as Projects Funnel☆11Aug 10, 2022Updated 3 years ago
- MiniLM (BERT) embeddings from scratch☆19Aug 14, 2025Updated 6 months ago
- Dashboard showcasing Conjoint Analysis for the Electric Vehicle Lease Market (as at January 2020) in San Francisco☆15Feb 19, 2020Updated 6 years ago
- ☆17Jul 18, 2014Updated 11 years ago