Apache (Py)Spark type annotations (stub files).
☆118Aug 17, 2022Updated 3 years ago
Alternatives and similar repositories for pyspark-stubs
Users that are interested in pyspark-stubs are comparing it to the libraries listed below
Sorting:
- Asynchronous actions for PySpark☆48Dec 2, 2021Updated 4 years ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks☆360Jun 6, 2017Updated 8 years ago
- Helpers & syntactic sugar for PySpark.☆62Dec 4, 2025Updated 3 months ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆683Mar 6, 2025Updated last year
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 5 months ago
- A pyspark lib to validate data quality☆18Nov 11, 2022Updated 3 years ago
- Schema Registry integration for Apache Spark☆40Nov 16, 2022Updated 3 years ago
- Spark functions to run popular phonetic and string matching algorithms☆59Feb 22, 2022Updated 4 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆816Feb 27, 2026Updated last week
- Mirror of Apache Toree (Incubating)☆749Feb 27, 2026Updated last week
- Spark data profiling utilities☆23Nov 24, 2018Updated 7 years ago
- A Spark Atlas connector to track data lineage in Apache Atlas☆266Nov 16, 2022Updated 3 years ago
- Storm Database Explorer - Developing Data Products course project.☆11May 3, 2017Updated 8 years ago
- Spark style guide☆271Sep 30, 2024Updated last year
- Tools to deploy Hadoop on EMC Isilon☆17Jul 27, 2016Updated 9 years ago
- Base classes to use when writing tests with Spark☆1,549Dec 22, 2025Updated 2 months ago
- Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation☆23Apr 18, 2016Updated 9 years ago
- This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server☆50Jul 16, 2023Updated 2 years ago
- SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.☆153Jul 31, 2020Updated 5 years ago
- A boilerplate for writing PySpark Jobs☆395Jan 21, 2024Updated 2 years ago
- Python module for Named Entity Recognition (NER) using natural language processing.☆13May 30, 2021Updated 4 years ago
- 🔌 Flask S3Viewer is a powerful extension that makes it easy to browse S3 in any Flask application. (Python S3 Uploader / Flask S3 Upload…☆14Jan 8, 2025Updated last year
- Postgres extension drivers for quill☆15Oct 31, 2016Updated 9 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆17Jan 12, 2017Updated 9 years ago
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated 9 months ago
- A Hivemall wrapper for Spark☆31Apr 21, 2016Updated 9 years ago
- Real-world Spark pipelines examples☆83Feb 27, 2018Updated 8 years ago
- ☆15Jan 13, 2018Updated 8 years ago
- Manage Apache Atlas and Ranger configuration for your Hadoop environment.☆16May 4, 2021Updated 4 years ago
- sparkql: Apache Spark SQL DataFrame schema management for sensible humans☆12Sep 18, 2023Updated 2 years ago
- Spark NLP for Streamlit☆15Sep 12, 2021Updated 4 years ago
- Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover…☆13May 3, 2019Updated 6 years ago
- low-level helpers for Apache Spark libraries and tests☆16Dec 29, 2018Updated 7 years ago
- Type stubs for the tensorflow library☆17Aug 30, 2018Updated 7 years ago
- Redis search and indexing in Java☆16Sep 26, 2016Updated 9 years ago
- Integrate Apache Spark with Citus distributed Postgres☆17Apr 3, 2019Updated 6 years ago
- Flowchart for debugging Spark applications☆106Sep 25, 2024Updated last year
- Hadoop utility jar for troubleshooting integration with cloud object stores☆37Updated this week
- HandySpark - bringing pandas-like capabilities to Spark dataframes☆199May 19, 2019Updated 6 years ago