minrk/findspark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/minrk/findspark)

minrk / findspark

☆525

Alternatives and similar repositories for findspark

Users that are interested in findspark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / incubator-toree
View on GitHub
Mirror of Apache Toree (Incubating)
☆750Updated this week
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
almond-sh / almond
View on GitHub
A Scala kernel for Jupyter
☆1,624Jul 20, 2026Updated last week
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,372Mar 20, 2024Updated 2 years ago
dropbox / PyHive
View on GitHub
Python interface to Hive and Presto. 🐝
☆1,696Apr 13, 2026Updated 3 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,555Apr 20, 2026Updated 3 months ago
spark-notebook / spark-notebook
View on GitHub
Interactive and Reactive Data Science using Scala and Spark.
☆3,142May 16, 2023Updated 3 years ago
stitchfix / s3drive
View on GitHub
S3 backed ContentsManager for jupyter notebooks
☆14Feb 10, 2016Updated 10 years ago
lensacom / sparkit-learn
View on GitHub
PySpark + Scikit-learn = Sparkit-learn
☆1,151Dec 31, 2020Updated 5 years ago
databricks / spark-redshift
View on GitHub
Redshift data source for Apache Spark
☆608Aug 10, 2023Updated 2 years ago
svenkreiss / pysparkling
View on GitHub
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
☆270Sep 3, 2024Updated last year
databricks / spark-sklearn
View on GitHub
(Deprecated) Scikit-learn integration package for Apache Spark
☆1,071Dec 3, 2019Updated 6 years ago
graphframes / graphframes
View on GitHub
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
☆1,196Updated this week
dask / old-dask-examples
View on GitHub
Collection of dask example notebooks
☆57Feb 14, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
radanalyticsio / silex
View on GitHub
something to help you spark
☆65Oct 23, 2018Updated 7 years ago
nchammas / flintrock
View on GitHub
A command-line tool for launching Apache Spark clusters.
☆651Dec 13, 2024Updated last year
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,871Updated this week
zero323 / pyspark-stubs
View on GitHub
Apache (Py)Spark type annotations (stub files).
☆118Aug 17, 2022Updated 3 years ago
apache / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆43,716Updated this week
adtech-labs / spylon-kernel
View on GitHub
Jupyter kernel for scala and spark
☆191Jan 11, 2024Updated 2 years ago
wesm / feather
View on GitHub
Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
☆2,756Dec 8, 2025Updated 7 months ago
dask / knit
View on GitHub
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
☆54Jul 3, 2018Updated 8 years ago
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,836Mar 3, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
blaze / odo
View on GitHub
Data Migration for the Blaze Project
☆1,006Jul 15, 2022Updated 4 years ago
martindurant / fastparquet
View on GitHub
python implementation of the parquet columnar file format.
☆21Jun 29, 2026Updated last month
databricks / spark-deep-learning
View on GitHub
Deep Learning Pipelines for Apache Spark
☆1,989Mar 30, 2023Updated 3 years ago
potix2 / spark-google-spreadsheets
View on GitHub
Google Spreadsheets datasource for SparkSQL and DataFrames
☆58Jul 24, 2023Updated 3 years ago
yahoo / TensorFlowOnSpark
View on GitHub
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
☆3,846Jul 10, 2023Updated 3 years ago
jcrobak / parquet-python
View on GitHub
python implementation of the parquet columnar file format.
☆362Oct 26, 2021Updated 4 years ago
dask / dask-searchcv
View on GitHub
dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
☆239Oct 13, 2018Updated 7 years ago
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Jul 21, 2026Updated last week
microsoft / SynapseML
View on GitHub
Simple and Distributed Machine Learning
☆5,233Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
funkyminds / cleanframes
View on GitHub
type-class based data cleansing library for Apache Spark SQL
☆79Jun 23, 2019Updated 7 years ago
jupyterhub / jupyterhub
View on GitHub
Multi-user server for Jupyter notebooks
☆8,329Updated this week
cloudera / livy
View on GitHub
Livy is an open source REST interface for interacting with Apache Spark from anywhere
☆1,007Oct 5, 2022Updated 3 years ago
g1thubhub / phil_stopwatch
View on GitHub
☆39Mar 4, 2019Updated 7 years ago
jupyter-scala / ammonium
View on GitHub
Impatient fork of Ammonite
☆63Jul 30, 2018Updated 7 years ago
databricks / spark-csv
View on GitHub
CSV Data Source for Apache Spark 1.x
☆1,057Dec 13, 2018Updated 7 years ago
PiercingDan / spark-Jupyter-AWS
View on GitHub
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
☆260Nov 3, 2017Updated 8 years ago