Bergvca/pyspark_dist_explore

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Bergvca/pyspark_dist_explore)

Bergvca / pyspark_dist_explore

Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.

☆102

Alternatives and similar repositories for pyspark_dist_explore

Users that are interested in pyspark_dist_explore are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

daniel-acuna / pyspark_pipes
View on GitHub
Helper functions for building complex Spark ML pipelines
☆12Apr 10, 2018Updated 8 years ago
dvgodoy / handyspark
View on GitHub
HandySpark - bringing pandas-like capabilities to Spark dataframes
☆199May 19, 2019Updated 7 years ago
julioasotodv / spark-df-profiling
View on GitHub
Create HTML profiling reports from Apache Spark DataFrames
☆197Feb 2, 2020Updated 6 years ago
hibayesian / spark-lof
View on GitHub
A parallel implementation of local outlier factor based on Spark
☆17Jan 26, 2022Updated 4 years ago
rozester / LinkedIn-Comments-Analyzer
View on GitHub
Extracting LinkedIn comments from any post and export it to Excel file
☆23Oct 17, 2018Updated 7 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
drabastomek / learningPySpark_video
View on GitHub
Learning PySpark video series
☆11Mar 5, 2018Updated 8 years ago
mattyb149 / nifi-client
View on GitHub
A NiFi client library for JVM languages
☆13Mar 18, 2016Updated 10 years ago
AndrewRook / ptplot
View on GitHub
Easily make interactive plots of player-tracking data
☆11Sep 20, 2021Updated 4 years ago
fugue-project / tune
View on GitHub
An abstraction layer for parameter tuning
☆35Dec 16, 2025Updated 7 months ago
edgararuiz-zz / bigdatalondon2018
View on GitHub
Materials from the Data Science with Spark and R
☆21Nov 15, 2018Updated 7 years ago
idealo / jenkins-ci
View on GitHub
Minimal example to setup a Jenkins-CI pipeline for data science projects on OpenShift in a couple of minutes.
☆27Jan 7, 2025Updated last year
krishnan-r / sparkmonitor
View on GitHub
Monitor Apache Spark from Jupyter Notebook
☆172May 16, 2022Updated 4 years ago
msesia / cqr-comparison
View on GitHub
A comparison of some conformal quantile regression methods.
☆12Sep 14, 2019Updated 6 years ago
amesar / spark-python-scala-udf
View on GitHub
Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR
☆23Mar 3, 2020Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
0liu / ibstract
View on GitHub
Asynchronous financial data management
☆22Oct 3, 2017Updated 8 years ago
zero323 / pyspark-asyncactions
View on GitHub
Asynchronous actions for PySpark
☆47Dec 2, 2021Updated 4 years ago
hipic / biz_data_LA
View on GitHub
Business Data Analysis by HiPIC of CalStateLA
☆21Oct 26, 2018Updated 7 years ago
Affirm / shparkley
View on GitHub
Spark implementation of computing Shapley Values using monte-carlo approximation
☆80Mar 20, 2023Updated 3 years ago
tranceitionalMynd / spy-reaper
View on GitHub
Algorithmic trading application for use with Interactive Brokers
☆17Jan 9, 2018Updated 8 years ago
gabrielspmoreira / kaggle_outbrain_click_prediction_google_cloud_ml_engine
View on GitHub
A POC of Google's Wide & Deep Learning models deployed on Google Cloud ML Engine for Kaggle's Outbrain Click Competition
☆36Jun 19, 2018Updated 8 years ago
choldgraf / makeitpop
View on GitHub
Warp your data like Jet warps your perception
☆13Feb 16, 2024Updated 2 years ago
PacktPublishing / Apache-Spark-2x-Machine-Learning-Cookbook
View on GitHub
Apache Spark 2x Machine Learning Cookbook, published by Packt
☆33Jul 23, 2025Updated last year
hanhanwu / Hanhan_Data_Science_Practice
View on GitHub
data analysis, big data development, cloud, and any other cool things!
☆31Jul 30, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
BlueMetal / iot-edge-dynocard
View on GitHub
☆11Jan 4, 2023Updated 3 years ago
zhangzhang10 / pydaal-tutorials
View on GitHub
Tutorials for uisng PyDAAL, i.e. the Python API of Intel Data Analytics Acceleration Library
☆11Apr 13, 2018Updated 8 years ago
qingyuan18 / finetune-vicuna-on-sagemaker
View on GitHub
☆10Sep 7, 2023Updated 2 years ago
Howuhh / link_pred_spark
View on GitHub
similarity between graph nodes based on local information with PySpark
☆10Sep 30, 2022Updated 3 years ago
bitner / lambda-mapproxy
View on GitHub
☆10Oct 10, 2023Updated 2 years ago
AidanCooper / shap-clustering
View on GitHub
How to use SHAP values for better cluster analysis
☆60May 15, 2022Updated 4 years ago
scalingpythonml / scalingpythonml
View on GitHub
Scaling Python Machine Learning
☆53Sep 7, 2023Updated 2 years ago
exploripy / exploripy
View on GitHub
Pre-Modelling Analysis of the data, by doing various exploratory data analysis and Statistical Test.
☆51Aug 17, 2023Updated 2 years ago
piratepeel / neoSBM
View on GitHub
A new type of SBM
☆18Nov 18, 2019Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
amesar / mlflow-fun
View on GitHub
MLflow samples - deprecated
☆22May 9, 2023Updated 3 years ago
databricks-industry-solutions / hls-payer-mrf-sparkstreaming
View on GitHub
Spark Structured Streaming for Payer MRF use case
☆15Nov 20, 2025Updated 8 months ago
mdrasmus / compbio
View on GitHub
Python libraries and utilities for computational biology
☆37Jun 6, 2014Updated 12 years ago
Ivo-Balbaert / start_julia
View on GitHub
Code of the book "Getting started with the Julia Programming Language"
☆11Jul 7, 2018Updated 8 years ago
slavivanov / cats_dogs_kaggle
View on GitHub
My code for the kaggle Cats and Dogs Redux competition. Placed in top 8%.
☆13Mar 23, 2017Updated 9 years ago
aws-samples / aws-analytics-immersion-day
View on GitHub
Describes the concepts of lambda architecture and the actual deployment process with an example of building a serverless business intelli…
☆15Jun 10, 2025Updated last year
feriat / meduza.io-parse-likes
View on GitHub
My first habrahabr post
☆13May 5, 2016Updated 10 years ago