rakutentech / spark-dirty-catLinks

Similarity encoding of dirty categorical variables (strings)

☆20

Alternatives and similar repositories for spark-dirty-cat

Users that are interested in spark-dirty-cat are comparing it to the libraries listed below

Sorting:

FlorianWilhelm / lda4rec
🧮 Extended Latent Dirichlet Allocation for Collaborative Filtering in Recommender Systems.
☆42Updated 3 years ago
alteryx / categorical_encoding
Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library
☆51Updated 2 years ago
MI2DataLab / pyBreakDown
Python implementation of R package breakDown
☆43Updated 2 years ago
FlorianWilhelm / bhm-at-scale
🪜 Bayesian Hierarchical Models at Scale
☆51Updated 3 years ago
david26694 / sktools
Helpers for scikit learn
☆16Updated 2 years ago
inovex / justcause
💊 Comparing causality methods in a fair and just way.
☆139Updated 5 years ago
zelros / cinnamon
CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system
☆77Updated 2 years ago
ModelOriented / SAFE
Surrogate Assisted Feature Extraction
☆37Updated 3 years ago
dkn22 / embedder
Embed categorical variables via neural networks.
☆59Updated 2 years ago
ColtAllen / btyd
Buy Till You Die and Customer Lifetime Value statistical models in Python.
☆117Updated last year
soda-inria / survival-analysis-benchmark
Exploratory repository to study predictive survival analysis models
☆34Updated 2 years ago
RickardKarl / causal-falsify
causal-falsify: A Python library with algorithms for falsifying unconfoundedness assumption in a composite dataset from multiple sources.
☆31Updated 3 weeks ago
koaning / scikit-partial
Pipeline components that support partial_fit.
☆46Updated last year
RecList / evalRS-KDD-2023
Official Repository for EvalRS @ KDD 2023: a Rounded Evaluation of Recommender Systems
☆30Updated last year
AidanCooper / shap-clustering
How to use SHAP values for better cluster analysis
☆59Updated 3 years ago
aredier / trelawney
General Interpretability Package
☆58Updated 2 years ago
lmcinnes / enstop
Ensemble topic modelling with pLSA
☆115Updated 3 years ago
guillermo-navas-palencia / clogistic
Logistic regression with bound and linear constraints. L1, L2 and Elastic-Net regularization.
☆33Updated 2 years ago
jphall663 / kdd_2019
Paper and talk from KDD 2019 XAI Workshop
☆20Updated 5 years ago
tlentali / leab
📈🔍 Lets Python do AB testing analysis.
☆78Updated 3 months ago
ing-bank / spark-matcher
Record matching and entity resolution at scale in Spark
☆35Updated last year
numeristical / introspective
Repo for the ML_Insights python package
☆152Updated 3 months ago
ericmjl / causality
In which I play with the ideas surrounding causality
☆53Updated 3 years ago
formlio / forml
ForML - A development framework and MLOps platform for the lifecycle management of data science projects
☆107Updated 2 years ago
corels / pycorels
Public home of pycorels, the python binding to CORELS
☆80Updated 5 years ago
Bergvca / pyspark_dist_explore
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
☆103Updated 5 years ago
edublancas / ml-testing
🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects
☆81Updated 3 years ago
ing-bank / industry2vec
☆29Updated 6 years ago
aromain / intermarche_challenge
[Intemarché] Sales forecasting challenge
☆11Updated 4 years ago
carlomazzaferro / scikit-hts-examples
Example usage of scikit-hts
☆57Updated 3 years ago