wdm0006 / DummyRDD
A pure python mock of pyspark's RDD
☆27Updated 6 years ago
Alternatives and similar repositories for DummyRDD:
Users that are interested in DummyRDD are comparing it to the libraries listed below
- Apache (Py)Spark type annotations (stub files).☆116Updated 2 years ago
- Send summary messages of your Luigi jobs to Slack☆46Updated 5 years ago
- Slack notifications for the Luigi workflow manager☆46Updated 3 years ago
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆52Updated 6 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Luigi Plugin for Hubot☆35Updated 8 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 8 years ago
- A wrapper for libhdfs3 to interact with HDFS from Python☆136Updated 4 years ago
- High Level Kafka Scanner☆19Updated 7 years ago
- A tool and library for easily deploying applications on Apache YARN☆143Updated 11 months ago
- Natural Language Processing with Spark's MLlib☆62Updated 7 years ago
- Dockerized setup for testing code on realistic hadoop clusters☆27Updated 4 years ago
- Ansible role to deploy and configure Airflow☆41Updated this week
- Utils around luigi.☆65Updated 4 years ago
- pytest plugin to run the tests with support of pyspark☆85Updated last year
- Airflow plugin to transfer arbitrary files between operators☆78Updated 6 years ago
- An example to illustrate using Luigi to manage a data science workflow in Greenplum Database☆12Updated 6 years ago
- A short guide for transitioning from Python to Scala☆65Updated 9 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- An Apache Spark-shell backend for IPython☆105Updated 3 years ago
- Google BigQuery support for Spark, SQL, and DataFrames☆155Updated 5 years ago
- Conversion utility from Zeppelin notes to Jupyter notebooks.☆44Updated 5 years ago
- ☆11Updated 5 years ago
- A pure Python implementation of Apache Spark's RDD and DStream interfaces.☆268Updated 6 months ago
- SQL on dataframes - pandas and dask☆64Updated 6 years ago
- Learn the pyspark API through pictures and simple examples☆170Updated 4 years ago
- Helpers & syntactic sugar for PySpark.☆61Updated last year
- Make your libraries magically appear in Databricks.☆47Updated last year
- Deploy dask on YARN clusters☆69Updated 6 months ago