rakutentech / spark-dirty-catLinks
Similarity encoding of dirty categorical variables (strings)
☆20Updated 6 years ago
Alternatives and similar repositories for spark-dirty-cat
Users that are interested in spark-dirty-cat are comparing it to the libraries listed below
Sorting:
- 🧮 Extended Latent Dirichlet Allocation for Collaborative Filtering in Recommender Systems.☆42Updated 3 years ago
- Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library☆51Updated 3 years ago
- Pipeline components that support partial_fit.☆46Updated last year
- Helpers for scikit learn☆16Updated 2 years ago
- 🪜 Bayesian Hierarchical Models at Scale☆51Updated 4 years ago
- Python implementation of R package breakDown☆43Updated 2 years ago
- Exploratory repository to study predictive survival analysis models☆37Updated 2 years ago
- Missing data amputation and exploration functions for Python☆72Updated 2 years ago
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆102Updated 6 years ago
- In which I play with the ideas surrounding causality☆53Updated 3 years ago
- Ensemble topic modelling with pLSA☆114Updated 4 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆76Updated 2 years ago
- Prune your sklearn models☆19Updated 11 months ago
- Gradient boosting on steroids☆28Updated last year
- 📈🔍 Lets Python do AB testing analysis.☆78Updated 5 months ago
- Logistic regression with bound and linear constraints. L1, L2 and Elastic-Net regularization.☆33Updated 2 years ago
- How to use SHAP values for better cluster analysis☆59Updated 3 years ago
- CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system☆77Updated 2 years ago
- Phi_K correlation analyzer library☆167Updated this week
- Record matching and entity resolution at scale in Spark☆35Updated last year
- In-Session Personalization Workshop for eCommerce, April 2021, and the MICES Workshop in June 2021.☆22Updated 4 years ago
- Cyclic Boosting Machines - an explainable supervised machine learning algorithm☆61Updated last year
- Embed categorical variables via neural networks.☆59Updated 2 years ago
- 💊 Comparing causality methods in a fair and just way.☆140Updated 5 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆106Updated 2 years ago
- Bag of, not words, but tricks!☆68Updated last year
- Example usage of scikit-hts☆57Updated 3 years ago
- this repo might get accepted☆28Updated 4 years ago
- A Python package for Bayesian A/B Testing☆61Updated 2 years ago