NYUBigDataProject / SparkCleanLinks
A Scalable Data Cleaning Library for PySpark.
☆27Updated 6 years ago
Alternatives and similar repositories for SparkClean
Users that are interested in SparkClean are comparing it to the libraries listed below
Sorting:
- Spark NLP for Streamlit☆15Updated 3 years ago
- Set of iPython and Jupyter extensions to improve user experience☆50Updated 5 years ago
- Sample techniques for a variety of feature extraction methods☆31Updated 4 years ago
- Example project for running LensKit experiments☆13Updated last month
- Record matching and entity resolution at scale in Spark☆34Updated last year
- PySpark phonetic and string matching algorithms☆39Updated last year
- ☆16Updated 2 years ago
- CentOS based Docker container for Time Series Analysis and Modeling.☆21Updated 5 years ago
- ☆26Updated 9 years ago
- ☆11Updated 6 years ago
- Model management example using Polyaxon, Argo and Seldon☆23Updated 6 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- Materials for Machine Learning with H2O Open Platform at ODSC Masterclass Summit 2017☆12Updated 8 years ago
- ☆19Updated 4 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- A curated list of articles, papers and tools for managing the building and deploying of machine learning models, aka machine learning eng…☆18Updated 6 years ago
- ☆16Updated 7 years ago
- Mastering Spark for Data Science, published by Packt☆47Updated 2 years ago
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Updated 4 years ago
- notebooks for nlp-on-spark☆13Updated 8 years ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆36Updated 4 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Creating a tunable and explainable recommendation system☆38Updated 5 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- ☆15Updated 5 years ago
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆37Updated 5 years ago
- Building an API with the FastAPI framework to serve a scikit-learn model.☆18Updated 6 years ago
- A simplified version of featuretools for Spark☆31Updated 5 years ago
- 📝 A blog post about report generation and automation in python☆40Updated 5 years ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago