dvgodoy/handyspark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dvgodoy/handyspark)

dvgodoy / handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

☆199

Alternatives and similar repositories for handyspark

Users that are interested in handyspark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Bergvca / pyspark_dist_explore
View on GitHub
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
☆102Aug 20, 2019Updated 6 years ago
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,372Mar 20, 2024Updated 2 years ago
bernhard-42 / pyspark-atlas
View on GitHub
PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection
☆17Jan 12, 2017Updated 9 years ago
sllynn / spark-xgboost
View on GitHub
A Python wrapper for XGBoost4J-Spark classes.
☆46Apr 12, 2024Updated 2 years ago
dvgodoy / deepreplay
View on GitHub
Deep Replay - Generate visualizations as in my "Hyper-parameters in Action!" series!
☆286Mar 24, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
duckdblabs / duckdb-substrait-demo
View on GitHub
☆17Jan 17, 2023Updated 3 years ago
pan5431333 / featuretools4s
View on GitHub
Run FeatureTools to automate Feature Engineering distributionally on Spark.
☆11Oct 11, 2018Updated 7 years ago
MarcKaminski / spark-FeatureSelection
View on GitHub
Featureselection methods as Spark MLlib Pipelines
☆30Apr 29, 2018Updated 8 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
amesar / spark-python-scala-udf
View on GitHub
Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR
☆23Mar 3, 2020Updated 6 years ago
cguegi / azure-databricks-airflow-example
View on GitHub
Example of orchestrating dependent Databricks jobs using Airflow
☆11Dec 19, 2019Updated 6 years ago
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
databricks / benchmarks
View on GitHub
A place in which we publish scripts for reproducible benchmarks.
☆105Dec 13, 2019Updated 6 years ago
devlace / azure-databricks-recommendation
View on GitHub
An end-to-end Recommendation System built on Azure Databricks
☆56Jul 29, 2019Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
asvcode / 1_cycle
View on GitHub
A disciplined approach to neural network parameters - Reviewing the approach for setting Hyper parameters by Leslie Smith
☆12Jul 18, 2018Updated 8 years ago
lifeomic / sparkflow
View on GitHub
Easy to use library to bring Tensorflow on Apache Spark
☆295Oct 11, 2023Updated 2 years ago
devlace / azure-databricks-anomaly
View on GitHub
Anomaly Detection Pipeline on Azure Databricks
☆28Jul 29, 2019Updated 7 years ago
tswast / code-snippets
View on GitHub
Snippets of code used in blog posts and other media.
☆13Jul 15, 2026Updated 2 weeks ago
hi-primus / optimus
View on GitHub
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
☆1,536Dec 2, 2024Updated last year
szilard / GBM-multicore
View on GitHub
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
☆20May 13, 2018Updated 8 years ago
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
neo4j-contrib / training-v2
View on GitHub
☆18Feb 14, 2026Updated 5 months ago
kapelner / ICEbox
View on GitHub
An R package for better visualizing a statistical learning model
☆35Jan 12, 2026Updated 6 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
JohnCoene / echarts4rShiny
View on GitHub
Demo of shiny working with echarts4r
☆16Feb 18, 2020Updated 6 years ago
jcrist / hadoop-test-cluster
View on GitHub
Dockerized setup for testing code on realistic hadoop clusters
☆26Jul 20, 2020Updated 6 years ago
aknvictor / BitcoinMAvgs
View on GitHub
☆11Dec 26, 2022Updated 3 years ago
jamespic / pyspark-flame
View on GitHub
A low-overhead sampling profiler for PySpark, that outputs Flame Graphs
☆16Dec 17, 2020Updated 5 years ago
pola-rs / valves
View on GitHub
general functions for your data .pipe()-lines.
☆17Nov 8, 2023Updated 2 years ago
narenst / infinity
View on GitHub
AWS Spot instances for ML
☆39Mar 21, 2023Updated 3 years ago
rstebbing / workshop
View on GitHub
☆13Jan 30, 2023Updated 3 years ago
alteryx / featuretools
View on GitHub
An open source python library for automated feature engineering
☆7,666Updated this week
databrickslabs / automl-toolkit
View on GitHub
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Mo…
☆191Jun 1, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
rstudio / sparkDemos
View on GitHub
☆52May 10, 2018Updated 8 years ago
potix2 / spark-google-spreadsheets
View on GitHub
Google Spreadsheets datasource for SparkSQL and DataFrames
☆58Jul 24, 2023Updated 3 years ago
GoogleCloudDataproc / spark-spanner-connector
View on GitHub
Cloud Spanner Connector for Apache Spark
☆18Jul 21, 2026Updated last week
krishnan-r / sparkmonitor
View on GitHub
Monitor Apache Spark from Jupyter Notebook
☆172May 16, 2022Updated 4 years ago
kensuio-oss / NLP-LSTM-Spark
View on GitHub
Project for the talk on NLP using LSTM implementation from DL4J on Spark
☆20May 6, 2016Updated 10 years ago
jupyter / jvm-repr
View on GitHub
API for converting JVM objects to representations by MIME type, for the Jupyter ecosystem.
☆26Jan 16, 2020Updated 6 years ago
dcos-labs / dcos-jupyterlab-service
View on GitHub
JupyterLab Notebook for Mesosphere DC/OS
☆11Aug 6, 2019Updated 6 years ago