svenkreiss/pysparkling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/svenkreiss/pysparkling)

svenkreiss / pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

☆270

Alternatives and similar repositories for pysparkling

Users that are interested in pysparkling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ekampf / PySpark-Boilerplate
View on GitHub
A boilerplate for writing PySpark Jobs
☆393Jan 21, 2024Updated 2 years ago
lensacom / sparkit-learn
View on GitHub
PySpark + Scikit-learn = Sparkit-learn
☆1,151Dec 31, 2020Updated 5 years ago
TomAugspurger / DSADD
View on GitHub
A python package for defensive data analysis.
☆17Jun 22, 2015Updated 11 years ago
pymc-devs / uq_chapter
View on GitHub
Uncertainty quantification book chapter
☆49Aug 13, 2015Updated 10 years ago
mvaz / osqf2015
View on GitHub
☆25Jun 5, 2015Updated 11 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
fonnesbeck / scipy2015_tutorial
View on GitHub
Computational Statistics II Tutorial at SciPy 2015
☆48Jul 15, 2015Updated 11 years ago
bentaylordata / datascience
View on GitHub
Data science repo to help others
☆12Feb 10, 2016Updated 10 years ago
jakevdp / git-intro
View on GitHub
Git/Github Intro
☆13Jun 17, 2015Updated 11 years ago
hi-primus / optimus
View on GitHub
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
☆1,536Dec 2, 2024Updated last year
instagibbs / FactorizationMachine
View on GitHub
☆36May 12, 2015Updated 11 years ago
wrobstory / pelican_dynamic
View on GitHub
Easily embed custom JS and CSS in your Pelican blog articles
☆53Jul 28, 2018Updated 8 years ago
minrk / findspark
View on GitHub
☆525Mar 1, 2026Updated 4 months ago
databricks / spark-sklearn
View on GitHub
(Deprecated) Scikit-learn integration package for Apache Spark
☆1,071Dec 3, 2019Updated 6 years ago
libdynd / dynd-python
View on GitHub
Python exposure of dynd
☆122Jun 21, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
bolt-project / bolt
View on GitHub
Unified interface for local and distributed ndarrays
☆157Oct 13, 2018Updated 7 years ago
gkamradt / ryd.io
View on GitHub
☆35May 4, 2016Updated 10 years ago
sparklingpandas / sparklingpandas
View on GitHub
Sparkling Pandas
☆361Jul 6, 2023Updated 3 years ago
ottogroup / palladium
View on GitHub
Framework for setting up predictive analytics services
☆493Jul 9, 2026Updated 2 weeks ago
DeloitteHux-Old / proficiency-metric
View on GitHub
☆28Jun 6, 2016Updated 10 years ago
mrocklin / pymarkdown
View on GitHub
Evaluate code in markdown
☆43Aug 11, 2015Updated 10 years ago
jadianes / spark-py-notebooks
View on GitHub
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
☆1,659Mar 16, 2024Updated 2 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
joblib / joblib-spark
View on GitHub
Joblib Apache Spark Backend
☆249Mar 24, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DistrictDataLabs / spark-workshop
View on GitHub
Data and code for "Fast Data Applications with Spark and Python"
☆25Sep 11, 2016Updated 9 years ago
zero323 / pyspark-stubs
View on GitHub
Apache (Py)Spark type annotations (stub files).
☆118Aug 17, 2022Updated 3 years ago
dolaameng / tutorials
View on GitHub
different types of tutorials, such as machine learning, image processing and etc.
☆100Apr 3, 2016Updated 10 years ago
thunder-project / thunder
View on GitHub
scalable analysis of images and time series
☆822Jan 6, 2017Updated 9 years ago
dask / dask-tensorflow
View on GitHub
☆93Jan 8, 2020Updated 6 years ago
ContinuumIO / cdx
View on GitHub
☆27Jul 31, 2023Updated 2 years ago
mara / mara-pipelines
View on GitHub
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
☆2,089Dec 15, 2023Updated 2 years ago
infOpen / ansible-role-airflow
View on GitHub
Ansible role to deploy and configure Airflow
☆41Jul 21, 2026Updated last week
jkthompson / pyspark-pictures
View on GitHub
Learn the pyspark API through pictures and simple examples
☆169Jan 23, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
davidjurgens / similarity-tutorial
View on GitHub
A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial
☆30Sep 30, 2015Updated 10 years ago
wikimedia / analytics-kafkatee
View on GitHub
Github mirror of "analytics/kafkatee" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…
☆20Nov 23, 2023Updated 2 years ago
blaze / odo
View on GitHub
Data Migration for the Blaze Project
☆1,006Jul 15, 2022Updated 4 years ago
tubular / sparkly
View on GitHub
Helpers & syntactic sugar for PySpark.
☆62Dec 4, 2025Updated 7 months ago
michaelosthege / fairflow
View on GitHub
Functional Airflow DAG definitions.
☆38Jul 4, 2017Updated 9 years ago
rssanders3 / airflow-spark-operator-plugin
View on GitHub
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
☆73Sep 20, 2019Updated 6 years ago
jreback / StrataNYC2015
View on GitHub
Strata NYC 2015
☆15Sep 30, 2015Updated 10 years ago