joblib / joblib-sparkView external linksLinks
Joblib Apache Spark Backend
☆249Apr 7, 2025Updated 10 months ago
Alternatives and similar repositories for joblib-spark
Users that are interested in joblib-spark are comparing it to the libraries listed below
Sorting:
- Spark implementation of computing Shapley Values using monte-carlo approximation☆80Mar 20, 2023Updated 2 years ago
- Distributed scikit-learn meta-estimators in PySpark☆287Apr 26, 2025Updated 9 months ago
- PyTorch Flexible Hash Embeddings☆28Feb 4, 2020Updated 6 years ago
- ☆24Jan 8, 2019Updated 7 years ago
- Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…☆1,876Jan 2, 2026Updated last month
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Nov 9, 2023Updated 2 years ago
- A simplified version of featuretools for Spark☆31Jun 14, 2019Updated 6 years ago
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs☆1,135Feb 6, 2026Updated last week
- PySpark test helper methods with beautiful error messages☆752Jan 13, 2026Updated last month
- A pure Python implementation of Apache Spark's RDD and DStream interfaces.☆270Sep 3, 2024Updated last year
- Extra blocks for scikit-learn pipelines.☆1,377Updated this week
- An open source python library for automated feature engineering☆7,610Feb 3, 2026Updated last week
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆809Feb 5, 2026Updated last week
- Simple and Distributed Machine Learning☆5,198Updated this week
- Machine learning enhancements to Spark MlLib☆20Mar 19, 2015Updated 10 years ago
- A Time Series Library for Apache Spark☆1,020Jul 3, 2020Updated 5 years ago
- Auto Generate Airflow's dag.py On The Fly☆10Feb 10, 2025Updated last year
- Demo of an In-database processing tool for scikit-learn☆13Oct 18, 2022Updated 3 years ago
- Self-hosted email subscriptions list using serverless AWS stack☆11Aug 31, 2020Updated 5 years ago
- Taxi fare prediction using tensorflow probability☆15Jul 23, 2019Updated 6 years ago
- VertMetric: An abstractive summarization evaluation package. VERT stands for Versatile Evaluation of Reduced Texts.☆11Dec 20, 2018Updated 7 years ago
- Collections of Slides for FOSS4G 2014☆10Sep 29, 2014Updated 11 years ago
- Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning☆13Jul 1, 2021Updated 4 years ago
- Performance of various open source GBM implementations☆223Nov 6, 2025Updated 3 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆232Jan 20, 2026Updated 3 weeks ago
- A tool to get better debug info on spark's memory usage☆42Aug 21, 2019Updated 6 years ago
- Create HTML profiling reports from Apache Spark DataFrames☆197Feb 2, 2020Updated 6 years ago
- An open protocol for secure data sharing☆919Feb 6, 2026Updated last week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,357Updated this week
- ☆10Dec 3, 2020Updated 5 years ago
- Active learning of GP hyperparameters following Garnett, et al., "Active Learning of Linear Embeddings for Gaussian Processes," (UAI 2014…☆16Aug 4, 2017Updated 8 years ago
- A few end to end examples that use data-describe☆17May 2, 2023Updated 2 years ago
- Helper files for using `Julia` with MTH229.☆14Jun 8, 2024Updated last year
- Backend implementation for running MLFlow projects on Hadoop/YARN.☆11Dec 27, 2022Updated 3 years ago
- The Open Source Feature Store for AI/ML☆6,702Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,580Feb 2, 2026Updated last week
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Nov 10, 2025Updated 3 months ago
- Qubole Sparklens tool for performance tuning Apache Spark☆589Jun 26, 2024Updated last year
- MLeap: Deploy ML Pipelines to Production☆1,532Jan 12, 2026Updated last month