A Scalable Data Cleaning Library for PySpark.
☆29Apr 4, 2019Updated 7 years ago
Alternatives and similar repositories for SparkClean
Users that are interested in SparkClean are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Manage Apache Atlas and Ranger configuration for your Hadoop environment.☆16May 4, 2021Updated 5 years ago
- [译] zeppelin 中文文档☆14Jul 21, 2023Updated 2 years ago
- A pyspark lib to validate data quality☆19Nov 11, 2022Updated 3 years ago
- featselector是一个基于统计分析和模型选择的特征选择器.☆14Mar 4, 2019Updated 7 years ago
- ETL jobs for Firefox Telemetry☆29May 7, 2026Updated 2 weeks ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- A collection of tools that help me work with Avro☆23Jan 7, 2010Updated 16 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- ☆14Sep 14, 2021Updated 4 years ago
- Fuzzy Categorical Distances☆14Mar 31, 2020Updated 6 years ago
- Distributed stock price forecasting system to predict S&P 500 stock prices.☆11Nov 12, 2021Updated 4 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Jun 23, 2016Updated 9 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- ☆16Jul 25, 2025Updated 9 months ago
- This data analysis provided information for the March 6th, 2018, NYC Open Data Week event hosted by the Two Sigma Data Clinic, "The State…☆13Jan 9, 2025Updated last year
- Filipino multi-modal NLP dataset. Consists of 350k+ Filipino news articles and associated images☆14Mar 11, 2025Updated last year
- python interface to bnlearn and other probabilistic graphical model libraries