Joblib Apache Spark Backend
☆249Apr 7, 2025Updated 10 months ago
Alternatives and similar repositories for joblib-spark
Users that are interested in joblib-spark are comparing it to the libraries listed below
Sorting:
- Spark implementation of computing Shapley Values using monte-carlo approximation☆80Mar 20, 2023Updated 2 years ago
- Distributed scikit-learn meta-estimators in PySpark☆288Apr 26, 2025Updated 10 months ago
- PyTorch Flexible Hash Embeddings☆28Feb 4, 2020Updated 6 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆683Mar 6, 2025Updated last year
- ☆24Jan 8, 2019Updated 7 years ago
- Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…☆1,879Jan 2, 2026Updated 2 months ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Nov 9, 2023Updated 2 years ago
- PySpark test helper methods with beautiful error messages☆753Feb 25, 2026Updated last week
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs☆1,136Updated this week
- A simplified version of featuretools for Spark☆31Jun 14, 2019Updated 6 years ago
- VSCode extension to work with Databricks☆134Updated this week
- Extra blocks for scikit-learn pipelines.☆1,382Updated this week
- plotly-scientific-tools is meant to augment the plotly and dash visualization libraries for python. It is designed to combine rapid and b…☆27Jul 18, 2025Updated 7 months ago
- An open source python library for automated feature engineering☆7,617Feb 3, 2026Updated last month
- Machine learning enhancements to Spark MlLib☆20Mar 19, 2015Updated 10 years ago
- A Time Series Library for Apache Spark☆1,022Jul 3, 2020Updated 5 years ago
- VertMetric: An abstractive summarization evaluation package. VERT stands for Versatile Evaluation of Reduced Texts.☆11Dec 20, 2018Updated 7 years ago
- Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning☆13Jul 1, 2021Updated 4 years ago
- Demo of an In-database processing tool for scikit-learn☆13Oct 18, 2022Updated 3 years ago
- Taxi fare prediction using tensorflow probability☆15Jul 23, 2019Updated 6 years ago
- Auto Generate Airflow's dag.py On The Fly☆10Feb 10, 2025Updated last year
- Collections of Slides for FOSS4G 2014☆10Sep 29, 2014Updated 11 years ago
- Performance of various open source GBM implementations☆224Feb 17, 2026Updated 2 weeks ago
- A library that provides useful extensions to Apache Spark and PySpark.☆233Jan 20, 2026Updated last month
- A tool to get better debug info on spark's memory usage☆42Aug 21, 2019Updated 6 years ago
- Create HTML profiling reports from Apache Spark DataFrames☆197Feb 2, 2020Updated 6 years ago
- An open protocol for secure data sharing☆920Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,363Feb 10, 2026Updated 3 weeks ago
- Backend implementation for running MLFlow projects on Hadoop/YARN.☆11Dec 27, 2022Updated 3 years ago
- A few end to end examples that use data-describe☆17May 2, 2023Updated 2 years ago
- Active learning of GP hyperparameters following Garnett, et al., "Active Learning of Linear Embeddings for Gaussian Processes," (UAI 2014…☆16Aug 4, 2017Updated 8 years ago
- Helper files for using `Julia` with MTH229.☆14Jun 8, 2024Updated last year
- ☆10Dec 3, 2020Updated 5 years ago
- The Open Source Feature Store for AI/ML☆6,756Updated this week
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆444Jul 16, 2025Updated 7 months ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,588Feb 17, 2026Updated 2 weeks ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Nov 10, 2025Updated 3 months ago
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, …☆24,485Updated this week
- Qubole Sparklens tool for performance tuning Apache Spark☆590Jun 26, 2024Updated last year