anguenot / pyspark-cassandra
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
☆69Updated 3 months ago
Alternatives and similar repositories for pyspark-cassandra:
Users that are interested in pyspark-cassandra are comparing it to the libraries listed below
- A convenient Python wrapper for Apache NiFi☆251Updated 3 months ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆174Updated last year
- Jupyter kernel for scala and spark☆187Updated last year
- A Spark cluster setup running on Docker containers☆60Updated 5 years ago
- Apache (Py)Spark type annotations (stub files).☆115Updated 2 years ago
- Docker container for Kafka - Spark Streaming - Cassandra☆97Updated 5 years ago
- A client for the Confluent Schema Registry API implemented in Python☆52Updated last year
- Asynchronous actions for PySpark☆47Updated 3 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Notes about Spark Streaming in Apache Spark☆58Updated 7 years ago
- ☆517Updated 2 years ago
- A tool and library for easily deploying applications on Apache YARN☆142Updated 10 months ago
- Spark Structured Streaming / Kafka / Cassandra / Elastic☆183Updated last year
- Serverless proxy for Spark cluster☆325Updated 4 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- An Integrated and collaborative cloud environment for building and running Spark applications on PKS/Kubernetes☆81Updated 4 years ago
- A Python MapReduce and HDFS API for Hadoop☆237Updated last year
- A full example of my blog post regarding Sparks stateful streaming (http://asyncified.io/2016/07/31/exploring-stateful-streaming-with-apa…☆34Updated 7 years ago
- Structured Streaming Machine Learning example with Spark 2.0☆92Updated 7 years ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆59Updated 6 years ago
- Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall☆98Updated 4 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- ☆245Updated 5 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 7 years ago
- Examples of Spark 2.0☆211Updated 3 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Apache Spark and Apache Kafka integration example☆123Updated 7 years ago
- Hive SerDe for CSV☆140Updated 3 years ago
- Scripts for generating Grafana dashboards for monitoring Spark jobs☆241Updated 9 years ago