A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
Alternatives and similar repositories for owl-data-sanitizer
Users that are interested in owl-data-sanitizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Project to concentrate files and settings for AWS EMR monitoring. Source: https://aws.amazon.com/blogs/big-data/monitor-and-optimize-anal…☆15Oct 11, 2024Updated last year
- The Distributed Node2Vec Algorithm for Very Large Graphs☆18Jul 19, 2021Updated 4 years ago
- Access Amazon's AWS Athena API via reticulate and AWS official Python boto3 module☆10Sep 24, 2018Updated 7 years ago
- ☆18Updated this week
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A python package to create a database on the platform using our moj data warehousing framework☆21Mar 16, 2026Updated last month
- ☆16Jun 27, 2020Updated 5 years ago
- A dataset of Valence/Arousal detection with deezer Id and MSD Id as input☆36Oct 3, 2017Updated 8 years ago
- Creating Debian Packages from CRAN Sources☆12Jul 1, 2020Updated 5 years ago
- ☆11Oct 11, 2022Updated 3 years ago
- Utilities for Asyncpg☆15Jan 24, 2019Updated 7 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated last month
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated 3 months ago
- ☆10Jun 29, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- CLI Based Browser for S3 Buckets☆14Aug 12, 2016Updated 9 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated this week
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Dec 24, 2016Updated 9 years ago
- ☆12Oct 16, 2023Updated 2 years ago
- A Scalable Data Cleaning Library for PySpark.☆29Apr 4, 2019Updated 7 years ago
- ☆15Dec 10, 2015Updated 10 years ago
- Data validation library for PySpark 3.0.0☆33Nov 11, 2022Updated 3 years ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Feb 8, 2023Updated 3 years ago
- Clojure library to explore inversion of control technique - in several senses.☆10May 14, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- Guide on how to setup Apache Airflow containers using Docker and IBM Bluemix☆11Feb 19, 2018Updated 8 years ago
- SparkSQL utils for ScalaPB☆43Jun 10, 2025Updated 10 months ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 6 months ago
- ☆11Nov 5, 2024Updated last year
- ☆21Dec 19, 2019Updated 6 years ago
- Re-usable python functions for decoupled interactions with CCA campus services☆10Jul 23, 2018Updated 7 years ago
- Due to lack of resources on how to deploy kafka with simple SASL authentication (just username and password) and how to write producer an…☆12Dec 29, 2021Updated 4 years ago
- (un)official css theme for KIT presentations with jupyter notebook slideshow☆13Sep 7, 2017Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An LLM-powered chatbot with the added context of the dbt knowledge base.☆39Dec 4, 2024Updated last year
- R package for formatting ggplot2 charts and applying MoJ corporate colours.☆17Nov 7, 2024Updated last year
- Example to create lineage in Atlas with sqoop and spark☆14Apr 5, 2017Updated 9 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Jul 11, 2018Updated 7 years ago
- Clusteval provides methods for unsupervised cluster validation☆70Feb 21, 2026Updated last month
- A Python client for managing connectors using the Kafka Connect API.☆12Oct 30, 2025Updated 5 months ago
- Rocksdb state storage implementation for Structured Streaming.☆17Oct 21, 2020Updated 5 years ago