GoogleCloudPlatform/DataflowJavaSDK

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GoogleCloudPlatform/DataflowJavaSDK)

GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

☆848

Alternatives and similar repositories for DataflowJavaSDK

Users that are interested in DataflowJavaSDK are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GoogleCloudPlatform / DataflowSDK-examples
View on GitHub
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. This re…
☆167Jul 25, 2018Updated 7 years ago
dataArtisans / flink-dataflow
View on GitHub
Google Dataflow Runner for Apache Flink™ (deprecated; please use the up-to-date Beam Runner)
☆88Jul 7, 2016Updated 10 years ago
GoogleCloudPlatform / DataflowPythonSDK
View on GitHub
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
☆163May 31, 2017Updated 9 years ago
apache / beam
View on GitHub
Apache Beam is a unified programming model for Batch and Streaming data processing.
☆8,636Updated this week
apache / apex-core
View on GitHub
Mirror of Apache Apex core
☆350Jun 7, 2021Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
googlegenomics / dataflow-java
View on GitHub
Google Cloud Dataflow pipelines such as Identity-By-State as well as useful utility classes.
☆38Aug 9, 2023Updated 2 years ago
spotify / scio
View on GitHub
A Scala API for Apache Beam and Google Cloud Dataflow.
☆2,626Jul 14, 2026Updated last week
googlearchive / cloud-pubsub-samples-java
View on GitHub
Cloud Pub/Sub sample applications with Java
☆52Jan 22, 2020Updated 6 years ago
amplab / keystone
View on GitHub
Simplifying robust end-to-end machine learning on Apache Spark.
☆473Apr 18, 2017Updated 9 years ago
GoogleCloudPlatform / spark-examples
View on GitHub
Spark pipelines that correspond to a series of Dataflow examples.
☆27May 5, 2019Updated 7 years ago
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
GoogleCloudPlatform / processing-logs-using-dataflow
View on GitHub
Processing Logs at Scale using Cloud Dataflow
☆60Mar 18, 2019Updated 7 years ago
iconara / bigshift
View on GitHub
A tool for moving tables from Redshift to BigQuery
☆65Jan 20, 2019Updated 7 years ago
apache / incubator-heron
View on GitHub
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
☆3,629Mar 1, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
pulsarIO / realtime-analytics
View on GitHub
Realtime analytics, this includes the core components of Pulsar pipeline.
☆650Nov 6, 2015Updated 10 years ago
sryza / spark-timeseries
View on GitHub
A library for time series analysis on Apache Spark
☆1,197Oct 13, 2020Updated 5 years ago
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
GoogleCloudPlatform / cloud-pubsub-logging-python
View on GitHub
Small python logging handlers that directly send the logs to Cloud Pub/Sub
☆23Jul 24, 2020Updated 5 years ago
GoogleCloudPlatform / cloud-bigtable-examples
View on GitHub
Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.
☆236Mar 25, 2026Updated 3 months ago
googledatalab / datalab
View on GitHub
Interactive tools and developer experiences for Big Data on Google Cloud Platform.
☆978Sep 2, 2022Updated 3 years ago
twitter / summingbird
View on GitHub
Streaming MapReduce with Scalding and Storm
☆2,123Jan 19, 2022Updated 4 years ago
memsql / streamliner-starter
View on GitHub
Starter project for building MemSQL Streamliner Pipelines
☆32Apr 18, 2017Updated 9 years ago
apache / beam-site
View on GitHub
Apache Beam Site
☆30Jul 8, 2026Updated last week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
yahoo / streaming-benchmarks
View on GitHub
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
☆647Dec 17, 2023Updated 2 years ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated 3 weeks ago
apache / zeppelin
View on GitHub
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
☆6,644Updated this week
addthis / stream-lib
View on GitHub
Stream summarizer and cardinality estimator.
☆2,265Nov 28, 2019Updated 6 years ago
twitter-archive / ambrose
View on GitHub
A platform for visualization and real-time monitoring of data workflows
☆1,170Jan 22, 2020Updated 6 years ago
Stratio / Decision
View on GitHub
Powered by Spark Streaming & Siddhi
☆317Feb 11, 2020Updated 6 years ago
brightcove-archive / ooyala_spark-jobserver
View on GitHub
REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver…
☆345May 19, 2017Updated 9 years ago
gearpump / gearpump
View on GitHub
Lightweight real-time big data streaming engine over Akka
☆756Jul 14, 2026Updated last week
YahooArchive / samoa
View on GitHub
SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.
☆427Mar 28, 2016Updated 10 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mesos / myriad
View on GitHub
https://github.com/apache/incubator-myriad is our new home. See
☆251Dec 2, 2015Updated 10 years ago
adobe-research / spindle
View on GitHub
Next-generation web analytics processing with Scala, Spark, and Parquet.
☆330Mar 28, 2015Updated 11 years ago
cdapio / tephra
View on GitHub
Apache Tephra: Transactions for HBase.
☆159Sep 13, 2024Updated last year
Netflix / suro
View on GitHub
Netflix's distributed Data Pipeline
☆796Apr 10, 2023Updated 3 years ago
apache / predictionio
View on GitHub
PredictionIO, a machine learning server for developers and ML engineers.
☆12,521Jan 9, 2021Updated 5 years ago
googleapis / google-cloud-java
View on GitHub
Google Cloud Client Library for Java
☆2,070Updated this week
spark-notebook / spark-notebook
View on GitHub
Interactive and Reactive Data Science using Scala and Spark.
☆3,142May 16, 2023Updated 3 years ago