NYUBigDataProject / SparkClean
A Scalable Data Cleaning Library for PySpark.
☆26Updated 5 years ago
Alternatives and similar repositories for SparkClean:
Users that are interested in SparkClean are comparing it to the libraries listed below
- Set of iPython and Jupyter extensions to improve user experience☆50Updated 5 years ago
- PySpark phonetic and string matching algorithms☆39Updated 11 months ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- PySpark, Databrick, h2o, MLlib☆18Updated 8 years ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Hierarchical Clustering Algorithms☆35Updated 2 years ago
- Example project for running LensKit experiments☆13Updated last week
- ☆11Updated 6 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Updated 4 years ago
- Topic modelling on financial news with Natural Language Processing☆58Updated 7 years ago
- A simplified version of featuretools for Spark☆31Updated 5 years ago
- ☆19Updated 3 years ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- Predict whether a student will correctly answer a problem based on past performance using automated feature engineering☆32Updated 4 years ago
- Cookiecutter template for testing Python scikit-learn clustering learners.☆17Updated 2 years ago
- CentOS based Docker container for Time Series Analysis and Modeling.☆21Updated 5 years ago
- ☆12Updated 4 years ago
- Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl☆71Updated last year
- Live Twitter sentiment analysis using Python, Apache Spark Streaming, Kafka, NLTK, SocketIO☆20Updated 7 years ago
- 📝 A blog post about report generation and automation in python☆40Updated 5 years ago
- library for conducting propensity matching on spark scale☆14Updated last year
- NLP tool for optimizing a resume for a job description, computing similarity, and extracting skills☆16Updated 7 years ago
- A cookiecutter template for Apache Spark applications written in Scala☆10Updated 6 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- ☆26Updated 8 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Public repository made for Automated Feature Engineering workshop (Summer Data Conf, Odessa, 2018-07-21)☆19Updated 6 years ago