HandySpark - bringing pandas-like capabilities to Spark dataframes
☆199May 19, 2019Updated 6 years ago
Alternatives and similar repositories for handyspark
Users that are interested in handyspark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆102Aug 20, 2019Updated 6 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆17Jan 12, 2017Updated 9 years ago
- Interactive visualizations for differential expression☆25Dec 8, 2022Updated 3 years ago
- Run FeatureTools to automate Feature Engineering distributionally on Spark.☆11Oct 11, 2018Updated 7 years ago
- Examples and custom spark images for working with the spark-on-k8s operator on AWS☆26Feb 14, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 6 months ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- Example of orchestrating dependent Databricks jobs using Airflow☆11Dec 19, 2019Updated 6 years ago
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…☆818Mar 4, 2026Updated 3 weeks ago
- Easy to use library to bring Tensorflow on Apache Spark☆296Oct 11, 2023Updated 2 years ago
- Snippets of code used in blog posts and other media.☆13Nov 11, 2025Updated 4 months ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,539Dec 2, 2024Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆187Oct 15, 2025Updated 5 months ago
- Some notes/codes on hyperparameters tuning techniques with some hacking around...☆23May 11, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆17Feb 14, 2026Updated last month
- Port of TPC-DS data generator to Java☆13Aug 1, 2017Updated 8 years ago
- Dockerized setup for testing code on realistic hadoop clusters☆26Jul 20, 2020Updated 5 years ago
- Cloud Spanner Connector for Apache Spark☆17Mar 17, 2026Updated last week
- Mirror of Apache Toree (Incubating)☆749Updated this week
- general functions for your data .pipe()-lines.☆17Nov 8, 2023Updated 2 years ago
- A low-overhead sampling profiler for PySpark, that outputs Flame Graphs☆16Dec 17, 2020Updated 5 years ago
- ☆13Jan 30, 2023Updated 3 years ago
- ☆11Dec 26, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Google Spreadsheets datasource for SparkSQL and DataFrames☆57Jul 24, 2023Updated 2 years ago
- An open source python library for automated feature engineering☆7,626Feb 3, 2026Updated last month
- ☆53May 10, 2018Updated 7 years ago
- API for converting JVM objects to representations by MIME type, for the Jupyter ecosystem.☆25Jan 16, 2020Updated 6 years ago
- Monitor Apache Spark from Jupyter Notebook☆172May 16, 2022Updated 3 years ago
- JupyterLab Notebook for Mesosphere DC/OS☆11Aug 6, 2019Updated 6 years ago
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆88Jan 3, 2020Updated 6 years ago
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.☆2,239Jun 27, 2024Updated last year
- Sample code with integration between Data Catalog and Hive data source.☆24Jan 29, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 6 months ago
- Apache Spark OpenCPU Executor (ROSE)☆25Jun 16, 2018Updated 7 years ago
- Bayarea what to do on weekend 湾区周末啥活动☆10Apr 1, 2019Updated 6 years ago
- ☆31Oct 14, 2019Updated 6 years ago
- SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol☆15Aug 18, 2023Updated 2 years ago
- [UNMAINTAINED] A starter pack for creating a lightweight responsive web app for Fast.AI PyTorch models.☆16Dec 5, 2018Updated 7 years ago
- A very simple way to deploy any machine learning model using Azure Functions☆29Jan 6, 2019Updated 7 years ago