agile-lab-dev / waspLinks

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

☆31

Alternatives and similar repositories for wasp

Users that are interested in wasp are comparing it to the libraries listed below

Sorting:

ing-bank / scruid
Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
☆115Updated 4 years ago
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
☆78Updated 6 years ago
AbsaOSS / hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
☆45Updated 3 months ago
sjwiesman / flink-scala-3
☆36Updated 3 years ago
smart-data-lake / smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆122Updated this week
lightbend / cloudflow
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
☆323Updated last year
springnz / sparkplug
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Updated 9 years ago
AbsaOSS / atum
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆29Updated 11 months ago
51zero / eel-sdk
Big Data Toolkit for the JVM
☆145Updated 4 years ago
pluralsight / hydra
A real-time data replication platform that "unbundles" the receiving, transforming, and transport of data streams.
☆82Updated last year
pluralsight / hydra-spark
☆50Updated 5 years ago
tharwaninitin / etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Goo…
☆44Updated last year
embeddedkafka / embedded-kafka-schema-registry
A library that provides in-memory instances of both Kafka and Confluent Schema Registry to run your tests against.
☆114Updated this week
HeartSaVioR / spark-state-tools
Spark Structured Streaming State Tools
☆34Updated 5 years ago
CoxAutomotiveDataSolutions / waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Updated last year
agile-lab-dev / darwin
Avro Schema Evolution made easy
☆36Updated last year
bartosz25 / spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
☆51Updated 5 years ago
streaming-analytics / Styx
Streaming Analytics platform, built with Apache Flink and Kafka
☆34Updated 2 years ago
univalence / spark-tools
☆45Updated 5 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago
hortonworks-spark / spark-schema-registry
Schema Registry integration for Apache Spark
☆40Updated 2 years ago
indix / schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
☆112Updated 5 years ago
mjakubowski84 / parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
☆297Updated 3 months ago
chermenin / spark-states
Custom state store providers for Apache Spark
☆92Updated 8 months ago
lightbend / model-serving-tutorial
Code and presentation for Strata Model Serving tutorial
☆68Updated 6 years ago
godatadriven / iterative-broadcast-join
The iterative broadcast join example code.
☆70Updated 7 years ago
datamindedbe / lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆62Updated last year
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆60Updated 2 years ago
criteo / cuttle
An embedded job scheduler.
☆116Updated last year
radicalbit / NSDb
Natural Series Database
☆54Updated 3 years ago