NYUBigDataProject / SparkCleanLinks
A Scalable Data Cleaning Library for PySpark.
☆27Updated 6 years ago
Alternatives and similar repositories for SparkClean
Users that are interested in SparkClean are comparing it to the libraries listed below
Sorting:
- PySpark phonetic and string matching algorithms☆39Updated last year
- PySpark, Databrick, h2o, MLlib☆18Updated 8 years ago
- Set of iPython and Jupyter extensions to improve user experience☆50Updated 5 years ago
- 📝 A blog post about report generation and automation in python☆40Updated 5 years ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- A curated list of articles, papers and tools for managing the building and deploying of machine learning models, aka machine learning eng…☆18Updated 6 years ago
- Example project for running LensKit experiments☆13Updated last month
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- ☆16Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- spark (scala and python)☆18Updated 5 years ago
- ☆26Updated last year
- Spark NLP for Streamlit☆15Updated 3 years ago
- Large-scale Graph Mining with Spark☆40Updated 6 years ago
- Public repository made for Automated Feature Engineering workshop (Summer Data Conf, Odessa, 2018-07-21)☆19Updated 6 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- ☆16Updated 7 years ago
- ☆16Updated 7 years ago
- Productivity Utilities for Data Science with Python Notebooks☆6Updated 5 years ago
- Instant search for and access to many datasets in Pyspark.