bloomberg/spark-flow

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bloomberg/spark-flow)

bloomberg / spark-flow

Library for organizing batch processing pipelines in Apache Spark

☆43

Alternatives and similar repositories for spark-flow

Users that are interested in spark-flow are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

faros-ai / faros-events-cli
View on GitHub
💻 CLI for reporting events to Faros platform
☆15Jul 21, 2026Updated last week
markmo / featurestore
View on GitHub
Building blocks and patterns for building data prep transformations and feature engineering in Spark.
☆16Mar 16, 2016Updated 10 years ago
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
mediative / sparrow
View on GitHub
Scala library for converting Spark rows to case classes
☆11Mar 14, 2017Updated 9 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
holdenk / sparklingml
View on GitHub
Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)
☆16Oct 14, 2019Updated 6 years ago
springnz / sparkplug
View on GitHub
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Aug 1, 2016Updated 9 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
brson / paxos-for-dummies-like-me
View on GitHub
☆11Nov 29, 2020Updated 5 years ago
ebiznext / spark-elasticsearch-mllib
View on GitHub
ScalaIO 2014 Workshop
☆25Oct 23, 2014Updated 11 years ago
lensesio / kafka-testing
View on GitHub
Repository for advanced unit-testing with embedded kafka services
☆25Dec 3, 2018Updated 7 years ago
rbrush / kite-apps
View on GitHub
Prescriptive Applications over Kite and Hadoop
☆12Oct 14, 2015Updated 10 years ago
jeremybeard / oozieloop
View on GitHub
Loops in Oozie
☆10Feb 15, 2015Updated 11 years ago
sparsecode / DaFlow
View on GitHub
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Jun 7, 2021Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
nordyke / akka-streams-examples
View on GitHub
☆13May 26, 2017Updated 9 years ago
dansimpson / chronos
View on GitHub
A small collection of abstractions for storing, traversing, and processing timeseries data in cassandra with hector
☆21Mar 27, 2020Updated 6 years ago
awslabs / amazon-s3-tagging-spark-util
View on GitHub
☆12Oct 16, 2023Updated 2 years ago
d-e-n-t-y / pg_fdw_mv_rewrite
View on GitHub
☆10Jul 31, 2019Updated 6 years ago
samhavens / be-manager
View on GitHub
lessons I am learning
☆14Oct 24, 2019Updated 6 years ago
elazarl / hadoop_rpc_walktrhough
View on GitHub
What happens on the wire when Hadoop RPC call is issued?
☆13Jul 1, 2022Updated 4 years ago
infinyon / flv-kf-protocol
View on GitHub
native Rust implementation of Kafka protocol and api
☆14Jun 13, 2023Updated 3 years ago
swoop-inc / spark-records
View on GitHub
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Mar 14, 2021Updated 5 years ago
jupadhya1 / REINFORCEMENT-LEARNING
View on GitHub
Reinforcement Learning (RL), allows you to develop smart, quick and self-learning systems in your business surroundings. It is an effecti…
☆13Sep 25, 2019Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
PacktPublishing / Mastering-Spark-for-Data-Science
View on GitHub
Mastering Spark for Data Science, published by Packt
☆51Apr 22, 2026Updated 3 months ago
bartosz25 / spark-scala-playground
View on GitHub
Sample processing code using Spark 2.1+ and Scala
☆51Jun 28, 2020Updated 6 years ago
uber / uberscriptquery
View on GitHub
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
☆65Dec 17, 2023Updated 2 years ago
postgrespro / plantuner
View on GitHub
☆10Feb 12, 2021Updated 5 years ago
gphat / wabisabi
View on GitHub
Scala Asynchronous ElasticSearch HTTP Client
☆98Nov 1, 2017Updated 8 years ago
PacktPublishing / Mastering-PyTorch-for-Deep-Learning
View on GitHub
Mastering PyTorch for Deep Learning, Published by Packt
☆14Jan 14, 2021Updated 5 years ago
hpclab / efficient-query-expansion
View on GitHub
Official repository of "Efficient and Effective Query Expansion for Web Search", Short Paper @ CIKM 2018
☆15Nov 17, 2019Updated 6 years ago
fedragon / gameoflife-scalajs
View on GitHub
Conway's Game of Life implemented in Scala.js
☆10Mar 30, 2018Updated 8 years ago
josephsweeney / versionDB
View on GitHub
A versioned database inspired by Git
☆16Dec 16, 2017Updated 8 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
akanimax / NLP2SQL
View on GitHub
A research and review of techniques to provide a natural language interface to RDMS.
☆10Dec 8, 2017Updated 8 years ago
superkley / udacity-aind
View on GitHub
Udacity Artificial Intelligence Nanodegree - May 2017
☆16Nov 1, 2017Updated 8 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
viirya / SparkAffinityPropagation
View on GitHub
Affinity Propagation on Spark
☆20May 31, 2021Updated 5 years ago
UDST / manta
View on GitHub
Microsimulation Analysis for Network Traffic Assignment
☆18Oct 3, 2023Updated 2 years ago
jenciso / confluent-cluster
View on GitHub
Playbook to provision a Confluent Cluster
☆10Oct 22, 2017Updated 8 years ago
pnowojski / simd-blog
View on GitHub
Source code for SIMD benchmarks and experiments in Java
☆32Jun 30, 2017Updated 9 years ago