hortonworks-spark / cloud-integrationLinks

Spark cloud integration: tests, cloud committers and more

☆20

Alternatives and similar repositories for cloud-integration

Users that are interested in cloud-integration are comparing it to the libraries listed below

Sorting:

hortonworks-spark / spark-schema-registry
Schema Registry integration for Apache Spark
☆40Updated 2 years ago
HeartSaVioR / spark-state-tools
Spark Structured Streaming State Tools
☆34Updated 5 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago
ExpediaGroup / circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
☆90Updated last year
maropu / spark-sql-server
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol
☆34Updated 3 years ago
mkuthan / example-spark-kafka
Apache Spark and Apache Kafka integration example
☆124Updated 7 years ago
FINRAOS / MegaSparkDiff
A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…
☆52Updated 4 months ago
ansrivas / spark-structured-streaming
Spark structured streaming with Kafka data source and writing to Cassandra
☆62Updated 5 years ago
egen / spark-sftp
Spark connector for SFTP
☆98Updated 2 years ago
rdblue / s3committer
Hadoop output committers for S3
☆111Updated 5 years ago
zalando-incubator / spark-json-schema
JSON schema parser for Apache Spark
☆82Updated 3 years ago
bartosz25 / spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
☆51Updated 5 years ago
SponsorPay / jaquet
Spark stream from kafka(json) to s3(parquet)
☆15Updated 6 years ago
chermenin / spark-states
Custom state store providers for Apache Spark
☆92Updated 8 months ago
cloudera-labs / envelope
Build configuration-driven ETL pipelines on Apache Spark
☆161Updated 3 years ago
lensesio / kafka-connect-query-language
SQL for Kafka Connectors
☆99Updated last year
ibm-research-ireland / sparkoscope
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
☆47Updated 8 years ago
hortonworks-spark / spark-hive-streaming-sink
A sink to save Spark Structured Streaming DataFrame into Hive table
☆23Updated 7 years ago
hammerlab / yarn-logs-helpers
Scripts for parsing / making sense of yarn logs
☆52Updated 9 years ago
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
☆78Updated 6 years ago
KeithSSmith / spark-compaction
File compaction tool that runs on top of the Spark framework.
☆59Updated 6 years ago
nerdammer / spark-additions
Utilities for Apache Spark
☆34Updated 9 years ago
ottogroup / schedoscope
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…
☆96Updated 5 years ago
holdenk / spark-validator
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…
☆108Updated 7 years ago
airbnb / sputnik
☆63Updated 5 years ago
hbutani / spark-datetime
functionstest
☆33Updated 9 years ago
qubole / spark-acid
ACID Data Source for Apache Spark based on Hive ACID
☆97Updated 4 years ago
51zero / eel-sdk
Big Data Toolkit for the JVM
☆145Updated 4 years ago
CoxAutomotiveDataSolutions / waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Updated last year
palantir / spark-influx-sink
A Spark metrics sink that pushes to InfluxDb
☆51Updated 4 years ago