Quantmetry / pipeasy-spark
an easy way to define preprocessing data pipeline (similar to sklean-pandas but for Spark ML)
☆17Updated 6 years ago
Alternatives and similar repositories for pipeasy-spark:
Users that are interested in pipeasy-spark are comparing it to the libraries listed below
- Example usage of scikit-hts☆57Updated 2 years ago
- Hierarchical Time Series Forecasting with a familiar API☆224Updated last year
- Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library☆51Updated 2 years ago
- Initier la mise à disposition, pour tout citoyen, de techniques d’Intelligence Artificielle destinées à appréhender le nombre important d…☆12Updated 6 months ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated last year
- A list of repositories commonly used @ Quantmetry☆14Updated 5 years ago
- Supervised forecasting of sequential data in Python.☆55Updated 6 years ago
- Embed categorical variables via neural networks.☆59Updated last year
- Model Error Analysis for scikit-learn models.☆29Updated 3 years ago
- Repo for the ML_Insights python package☆149Updated last year
- Surrogate Assisted Feature Extraction☆36Updated 3 years ago
- A toolbox for fair and explainable machine learning☆55Updated 8 months ago
- A catalog of Jupyter Notebooks presenting new techniques to interpret black box machine learning models.☆15Updated 6 years ago
- Hierarchical Time Series Forecasting using Prophet☆144Updated 4 years ago
- mlmachine accelerates machine learning experimentation☆30Updated 3 years ago
- General Interpretability Package☆58Updated 2 years ago
- Better `keras` models for time series and beyond☆61Updated last year
- Visualization ideas for data science☆20Updated 6 years ago
- Bringing back uncertainty to machine learning.☆50Updated 8 months ago
- TSFresh primitives for featuretools☆36Updated 2 years ago
- ☆22Updated 5 years ago
- this repo might get accepted☆29Updated 4 years ago
- Developmental tools to detect data drift☆14Updated 11 months ago
- Distributed, large-scale, benchmarking framework for rigorous assessment of automatic machine learning repositories, projects, and librar…☆30Updated 2 years ago
- An extension of CatBoost to probabilistic modelling☆142Updated last year
- Random stuff I've been working on☆28Updated last year
- Time Series Forecasting Framework☆41Updated 2 years ago
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆103Updated 5 years ago
- Scripts for paper "Encoding high-cardinality string categorical variables"☆24Updated 5 years ago
- An attention-based Recurrent Neural Net multi-touch attribution model in a supervised learning fashion of predicting if a series of event…☆30Updated 3 years ago