ericxiao251/spark-syntax

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ericxiao251/spark-syntax)

ericxiao251 / spark-syntax

This is a repo documenting the best practices in PySpark.

☆465

Alternatives and similar repositories for spark-syntax

Users that are interested in spark-syntax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UrbanInstitute / pyspark-tutorials
View on GitHub
Code snippets and tutorials for working with social science data in PySpark
☆416Aug 11, 2017Updated 8 years ago
d6t / d6tflow
View on GitHub
Python library for building highly effective data science workflows
☆947Jun 28, 2026Updated last week
yennanliu / spark-etl-pipeline
View on GitHub
Various data stream/batch process demo with Apache Scala Spark 🚀
☆12Feb 28, 2020Updated 6 years ago
drabastomek / learningPySpark_video
View on GitHub
Learning PySpark video series
☆11Mar 5, 2018Updated 8 years ago
danielvdende / data-testing-with-airflow
View on GitHub
☆203Jun 8, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
awesome-spark / spark-gotchas
View on GitHub
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
☆359Jun 6, 2017Updated 9 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
guillaume-chevalier / How-to-Grow-Neat-Software-Architecture-out-of-Jupyter-Notebooks
View on GitHub
Growing the code out of your notebooks - the right way.
☆530Nov 6, 2022Updated 3 years ago
XD-DENG / Spark-practice
View on GitHub
Apache Spark (PySpark) Practice on Real Data
☆270Jan 31, 2020Updated 6 years ago
jadianes / spark-py-notebooks
View on GitHub
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
☆1,659Mar 16, 2024Updated 2 years ago
kevinschaich / pyspark-cheatsheet
View on GitHub
🐍 Quick reference guide to common patterns & functions in PySpark.
☆691Feb 21, 2023Updated 3 years ago
kkyon / botflow
View on GitHub
Python Fast Dataflow programming framework for Data pipeline work( Web Crawler,Machine Learning,Quantitative Trading.etc)
☆1,196Feb 3, 2026Updated 5 months ago
TomAugspurger / dask-tutorial-odsc-2018
View on GitHub
☆15Oct 27, 2022Updated 3 years ago
awesome-spark / awesome-spark
View on GitHub
A curated list of awesome Apache Spark packages and resources.
☆1,883Feb 27, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
nteract / coffee_boat
View on GitHub
☕⛵WIP PySpark dependency management
☆22Jul 8, 2018Updated 8 years ago
gtoonstra / etl-with-airflow
View on GitHub
ETL best practices with airflow, with examples
☆1,355Sep 25, 2024Updated last year
zhengzhugithub / AwesomeComputerVision
View on GitHub
Awesome Computer Vision Resources
☆85Feb 22, 2019Updated 7 years ago
ekampf / PySpark-Boilerplate
View on GitHub
A boilerplate for writing PySpark Jobs
☆393Jan 21, 2024Updated 2 years ago
mara / mara-pipelines
View on GitHub
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
☆2,089Dec 15, 2023Updated 2 years ago
bfemiano / song_plays_workshop_tutorial
View on GitHub
Song Plays Workshop Tutorial
☆13Nov 19, 2020Updated 5 years ago
TedBear42 / spark_training
View on GitHub
Sample Spark Code
☆91Sep 19, 2018Updated 7 years ago
airflow-plugins / pandora-plugin
View on GitHub
Plugin offering views, operators, sensors, and more developed at Pandora Media.
☆26May 3, 2018Updated 8 years ago
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
palantir / pyspark-style-guide
View on GitHub
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…
☆1,251Sep 8, 2025Updated 10 months ago
jghoman / awesome-apache-airflow
View on GitHub
Curated list of resources about Apache Airflow
☆3,920May 7, 2026Updated 2 months ago
taki0112 / Tensorflow-Cookbook
View on GitHub
Simple Tensorflow Cookbook for easy-to-use
☆2,747Feb 9, 2020Updated 6 years ago
rushilgupta / dronecontrol
View on GitHub
An alexa skill to control a parrot minidrone over voice.
☆63Mar 23, 2020Updated 6 years ago
AasTrailblazers / AzureSynapse
View on GitHub
☆15Jan 17, 2022Updated 4 years ago
robinhood / faust
View on GitHub
Python Stream Processing
☆6,822Jul 27, 2024Updated last year
airbnb / knowledge-repo
View on GitHub
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
☆5,534Sep 4, 2024Updated last year
howonlee / twostrangethings
View on GitHub
two strange things to do with neural nets
☆15Feb 18, 2019Updated 7 years ago
santinic / pampy
View on GitHub
Pampy: The Pattern Matching for Python you always dreamed of.
☆3,526Jan 16, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
andkret / Cookbook
View on GitHub
The Data Engineering Cookbook
☆15,167Jun 12, 2026Updated 3 weeks ago
cgarciae / pypeln
View on GitHub
Concurrent data pipelines in Python >>>
☆1,596Jul 20, 2023Updated 2 years ago
cortexlabs / cortex
View on GitHub
Production infrastructure for machine learning at scale
☆8,011Jun 12, 2024Updated 2 years ago
hi-primus / optimus
View on GitHub
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
☆1,535Dec 2, 2024Updated last year
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,459May 12, 2026Updated last month
mahmoudparsian / pyspark-tutorial
View on GitHub
PySpark-Tutorial provides basic algorithms using PySpark
☆1,279May 26, 2025Updated last year
malexer / pytest-spark
View on GitHub
pytest plugin to run the tests with support of pyspark
☆88May 21, 2025Updated last year