qubole/spark-state-store

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qubole/spark-state-store)

qubole / spark-state-store

Rocksdb state storage implementation for Structured Streaming.

☆17

Alternatives and similar repositories for spark-state-store

Users that are interested in spark-state-store are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HeartSaVioR / spark-sql-kafka-offset-committer
View on GitHub
Kafka offset committer for structured streaming query
☆41Feb 15, 2021Updated 5 years ago
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
chermenin / spark-states
View on GitHub
Custom state store providers for Apache Spark
☆92Feb 14, 2025Updated last year
zrlio / albis
View on GitHub
Albis: High-Performance File Format for Big Data Systems
☆21Jul 12, 2018Updated 8 years ago
sirkon / ch-encode
View on GitHub
Clickhouse typesafe RowBinary insert tooling
☆13Jul 6, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HeartSaVioR / spark-state-tools
View on GitHub
Spark Structured Streaming State Tools
☆34Jul 3, 2020Updated 6 years ago
qubole / kinesis-sql
View on GitHub
Kinesis Connector for Structured Streaming
☆139Jul 2, 2024Updated 2 years ago
PacktPublishing / Real-Time-Streaming-using-Apache-Spark-Streaming
View on GitHub
Real Time Streaming using Apache Spark Streaming [Video], published by Packt
☆10Oct 31, 2022Updated 3 years ago
Lancern / cache-coherence-protocol-bench
View on GitHub
Benchmarking code for evaluating the cost of cache coherence protocols implemented on different platforms
☆14Apr 13, 2021Updated 5 years ago
hortonworks-spark / cloud-integration
View on GitHub
Spark cloud integration: tests, cloud committers and more
☆20Jan 30, 2025Updated last year
attilapiros / trace-agent
View on GitHub
A java agent for tracing which can be configured via simple text file and instruments the code without rebuilding the project.
☆51Jul 12, 2026Updated last week
databricks-migrations / hadoop-profiler
View on GitHub
☆20Mar 15, 2024Updated 2 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
multifacet / cbmm-artifact
View on GitHub
Artifact package for CBMM paper (ATC'22)
☆11Jun 5, 2022Updated 4 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
TU-Berlin-DIMA / grizzly-prototype
View on GitHub
Grizzly: Efficient Stream Processing Through Adaptive Query Compilation
☆17Jun 13, 2020Updated 6 years ago
newfront / spark-intro-to-ml
View on GitHub
A Gentle introduction to Machine Learning with Apache Spark
☆11Mar 2, 2026Updated 4 months ago
joomcode / spark-platform
View on GitHub
Basic Spark utilities
☆13Feb 20, 2025Updated last year
SAITPublic / PNMLibrary
View on GitHub
SW Library for Samsung PNM (including functional simulator)
☆11Nov 2, 2023Updated 2 years ago
adamgfraser / 0-to-100-with-zio-test
View on GitHub
☆14May 28, 2020Updated 6 years ago
typelevel / typelevel.g8
View on GitHub
A typelevel.g8 based on sbt-typelevel
☆14Jul 17, 2026Updated last week
DmitryBe / clickhouse-kafka-connect
View on GitHub
Ingress data from kafka topic into clickhouse table (JSON format)
☆24Apr 12, 2018Updated 8 years ago
mikulskibartosz / check-engine
View on GitHub
Data validation library for PySpark 3.0.0
☆33Nov 11, 2022Updated 3 years ago
valuko / TPCx-BB
View on GitHub
Source code for TPCx-BB benchmark for Hive and SparkSQL on scale factor of 300 GB
☆10Jun 26, 2018Updated 8 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ist-dsi / docker-kerberos
View on GitHub
A kerberos KDC and a kerberos client in docker containers.
☆40Oct 25, 2020Updated 5 years ago
bluejoe2008 / spark-http-stream
View on GitHub
spark structured streaming via HTTP communication
☆18Jul 7, 2022Updated 4 years ago
ysarch-lab / nimble_page_management_userspace
View on GitHub
☆14Mar 29, 2019Updated 7 years ago
mark-hoffmann / fastteradata
View on GitHub
Tools for faster and optimized interaction with Teradata and large datasets.
☆17Jul 11, 2018Updated 8 years ago
git-disl / FastSwap
View on GitHub
Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud
☆16Feb 13, 2017Updated 9 years ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
fdeantoni / spark-websocket-datasource
View on GitHub
A sample custom Spark Structured Streaming Datasource with Websockets
☆12May 14, 2020Updated 6 years ago
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
javieraviles / spring-boot-redis-rest
View on GitHub
API REST boilerplate using Spring Boot and Redis as database
☆13Dec 26, 2018Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
divyam-rai / simple-kafka-sasl-docker-python
View on GitHub
Due to lack of resources on how to deploy kafka with simple SASL authentication (just username and password) and how to write producer an…
☆12Dec 29, 2021Updated 4 years ago
pcodding / stream-simulator
View on GitHub
Streaming Data Simulator
☆17Oct 12, 2020Updated 5 years ago
aljoscha / blog
View on GitHub
Thoughts on things I find interesting.
☆17Dec 19, 2024Updated last year
shwethags / atlas-lineage
View on GitHub
Example to create lineage in Atlas with sqoop and spark
☆14Apr 5, 2017Updated 9 years ago
multifacet / 0sim-workspace
View on GitHub
Tools and experiments for 0sim. Simulate system software behavior on machines with terabytes of main memory from your desktop.
☆22May 27, 2020Updated 6 years ago
GTkernel / cori-sim
View on GitHub
Simulation infrastructure and validation of Cori
☆13Mar 22, 2022Updated 4 years ago
palantir / k8s-spark-scheduler
View on GitHub
A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes
☆179Apr 23, 2023Updated 3 years ago