svenkreiss / pysparklingView external linksLinks
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
☆270Sep 3, 2024Updated last year
Alternatives and similar repositories for pysparkling
Users that are interested in pysparkling are comparing it to the libraries listed below
Sorting:
- A boilerplate for writing PySpark Jobs☆395Jan 21, 2024Updated 2 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Dec 31, 2020Updated 5 years ago
- ☆25Jun 5, 2015Updated 10 years ago
- ☆36May 12, 2015Updated 10 years ago
- Data science repo to help others☆12Feb 10, 2016Updated 10 years ago
- ☆34May 4, 2016Updated 9 years ago
- Uncertainty quantification book chapter☆50Aug 13, 2015Updated 10 years ago
- Sparkling Pandas☆364Jul 6, 2023Updated 2 years ago
- Git/Github Intro☆13Jun 17, 2015Updated 10 years ago
- Framework for setting up predictive analytics services☆488Apr 15, 2023Updated 2 years ago
- Computational Statistics II Tutorial at SciPy 2015☆48Jul 15, 2015Updated 10 years ago
- PyTorch Flexible Hash Embeddings☆28Feb 4, 2020Updated 6 years ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,541Dec 2, 2024Updated last year
- Unified interface for local and distributed ndarrays☆157Oct 13, 2018Updated 7 years ago
- ☆28Jun 6, 2016Updated 9 years ago
- Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks☆1,667Mar 16, 2024Updated last year
- ☆525Jan 1, 2026Updated last month
- Learn the pyspark API through pictures and simple examples☆170Jan 23, 2021Updated 5 years ago
- A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial☆30Sep 30, 2015Updated 10 years ago
- Scikit-learn-compatible datasets☆16Oct 4, 2025Updated 4 months ago
- Word2Vec models with Twitter data using Spark. Blog:☆66Jan 15, 2019Updated 7 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,363Sep 9, 2025Updated 5 months ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Sep 20, 2019Updated 6 years ago
- Data Migration for the Blaze Project☆1,005Jul 15, 2022Updated 3 years ago
- A library for factorization machines and polynomial networks for classification and regression in Python.☆245Aug 7, 2020Updated 5 years ago
- Evaluate code in markdown☆43Aug 11, 2015Updated 10 years ago
- Joblib Apache Spark Backend☆249Apr 7, 2025Updated 10 months ago
- Functional Airflow DAG definitions.☆38Jul 4, 2017Updated 8 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- How to Bootstrap Internal Applications With IPython Widgets (PyData 2015)☆19Nov 10, 2015Updated 10 years ago
- klab docker image building scripts☆20Jul 10, 2019Updated 6 years ago
- Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models☆489Aug 11, 2017Updated 8 years ago
- a nose plugin for finding and running IPython notebooks as nose tests☆80Feb 17, 2022Updated 4 years ago
- Biased matrix factorisation using TensorFlow☆19Jun 30, 2016Updated 9 years ago
- Helpers & syntactic sugar for PySpark.☆62Dec 4, 2025Updated 2 months ago
- A JupyterLab extension to facilitate the discovery and installation of other extensions☆47Jan 21, 2019Updated 7 years ago
- Benchmarks of artificial neural network library for Spark MLlib☆11Dec 3, 2015Updated 10 years ago
- Factorization Machines for Julia☆11Aug 26, 2016Updated 9 years ago
- Distributed Deep Learning on Spark☆403Oct 8, 2016Updated 9 years ago