NYUBigDataProject / SparkClean
A Scalable Data Cleaning Library for PySpark.
☆27Updated 6 years ago
Alternatives and similar repositories for SparkClean:
Users that are interested in SparkClean are comparing it to the libraries listed below
- Public repository made for Automated Feature Engineering workshop (Summer Data Conf, Odessa, 2018-07-21)☆19Updated 6 years ago
- Example project for running LensKit experiments☆13Updated 2 weeks ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Spark NLP for Streamlit☆15Updated 3 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- NLP tool for optimizing a resume for a job description, computing similarity, and extracting skills☆16Updated 7 years ago
- Set of iPython and Jupyter extensions to improve user experience☆50Updated 5 years ago
- Sample techniques for a variety of feature extraction methods☆31Updated 4 years ago
- ☆11Updated 6 years ago
- CentOS based Docker container for Time Series Analysis and Modeling.☆21Updated 5 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆35Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- A simple introduction to using spark ml pipelines☆26Updated 7 years ago
- ☆19Updated 4 years ago
- ☆16Updated 2 years ago
- Topic modelling on financial news with Natural Language Processing☆59Updated 7 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …