joblib / joblib-spark
Joblib Apache Spark Backend
☆245Updated 3 weeks ago
Alternatives and similar repositories for joblib-spark:
Users that are interested in joblib-spark are comparing it to the libraries listed below
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆103Updated 5 years ago
- A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scal…☆239Updated last month
- HandySpark - bringing pandas-like capabilities to Spark dataframes☆193Updated 5 years ago
- Distributed scikit-learn meta-estimators in PySpark☆284Updated last week
- Create HTML profiling reports from Apache Spark DataFrames☆196Updated 5 years ago
- ☆522Updated 3 years ago
- Distributed XGBoost on Ray☆148Updated 10 months ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆500Updated 3 months ago
- MLflow samples - deprecated☆22Updated last year
- Jupyter kernel for scala and spark☆188Updated last year
- A tool and library for easily deploying applications on Apache YARN☆143Updated last year
- A tool for building feature stores.☆301Updated 3 weeks ago
- Apache (Py)Spark type annotations (stub files).☆117Updated 2 years ago
- XGBoost GPU accelerated on Spark example applications☆52Updated 2 years ago
- Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops☆117Updated 2 years ago
- Train and run Pytorch models on Apache Spark.☆339Updated last year
- python automatic data quality check toolkit☆283Updated 4 years ago
- Deploy dask on YARN clusters☆69Updated 8 months ago
- Python library for converting Apache Spark ML pipelines to PMML☆97Updated 2 months ago
- Isolation Forest on Spark☆227Updated 6 months ago
- MLOps Platform☆271Updated 6 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆223Updated last month
- Distributed SQL Engine in Python using Dask☆404Updated 8 months ago
- A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud☆141Updated 6 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Resources for Data Science Process management☆204Updated 5 years ago
- PostgreSQL offline and online stores for Feast☆32Updated 3 years ago
- The Synthetic Minority Oversampling Technique (SMOTE) implemented in Spark.☆48Updated 6 years ago
- Python API for Deequ☆766Updated last month