CoxAutomotiveDataSolutions/waimak

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CoxAutomotiveDataSolutions/waimak)

CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

☆76

Alternatives and similar repositories for waimak

Users that are interested in waimak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
datamindedbe / lighthouse
View on GitHub
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆64Sep 6, 2024Updated last year
montevideolabs / attractions-recommender
View on GitHub
☆12Jun 6, 2020Updated 6 years ago
sithankanna / naive-bayesians
View on GitHub
Repo for the Naive Bayesian Meetup Group
☆11Nov 12, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
YotpoLtd / metorikku
View on GitHub
A simplified, lightweight ETL Framework based on Apache Spark
☆588Jan 24, 2024Updated 2 years ago
51zero / eel-sdk
View on GitHub
Big Data Toolkit for the JVM
☆147Nov 4, 2020Updated 5 years ago
liquidm / druid-dumbo
View on GitHub
☆21Mar 17, 2023Updated 3 years ago
KoddiDev / geocoder
View on GitHub
Google Maps geocoding library for Scala
☆12Oct 12, 2019Updated 6 years ago
vaslabs / sbt-kubeyml
View on GitHub
Sbt plugin to help deploy Scala applications to Kubernetes
☆40Jul 13, 2026Updated last week
openaire / vipe
View on GitHub
Tool for visualizing Apache Oozie pipelines
☆13Feb 15, 2016Updated 10 years ago
OpenLinkSoftware / ai-agent-skills
View on GitHub
OPAL AI Agent Skills Collection (Skills.md Compliant)
☆31Updated this week
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
twitter / iago2
View on GitHub
A load generator, built for engineers
☆28Apr 10, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hortonworks-spark / spark-llap
View on GitHub
☆102Mar 23, 2020Updated 6 years ago
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
AbsaOSS / pramen
View on GitHub
Resilient data pipeline framework running on Apache Spark
☆31Updated this week
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
jeoffreylim / maelstrom
View on GitHub
Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …
☆21Feb 6, 2017Updated 9 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,637Updated this week
Thanos-DB / GlobalLocalFilterNetworks
View on GitHub
[ISBI 2024] Official implementation of GLOBAL-LOCAL (FREQUENCY) FILTER NETWORKS FOR EFFICIENT MEDICAL IMAGE SEGMENTATION
☆14May 28, 2024Updated 2 years ago
AbsaOSS / enceladus
View on GitHub
Dynamic Conformance Engine
☆33Mar 26, 2026Updated 4 months ago
apache / incubator-retired-amaterasu
View on GitHub
Apache Amaterasu
☆56Oct 18, 2019Updated 6 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
MrPowers / beavis
View on GitHub
Pandas helper functions
☆31Feb 19, 2023Updated 3 years ago
eed3si9n / sudori
View on GitHub
☆12Nov 12, 2021Updated 4 years ago
src-d / code-annotation
View on GitHub
🐈 Code Annotation Tool
☆29Oct 8, 2019Updated 6 years ago
CoxAutomotiveDataSolutions / spark-distcp
View on GitHub
A re-implementation of Hadoop DistCP in Apache Spark
☆47Dec 20, 2023Updated 2 years ago
emmalanguage / emma
View on GitHub
A quotation-based Scala DSL for scalable data analysis.
☆65Jul 7, 2022Updated 4 years ago
ing-bank / scruid
View on GitHub
Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
☆118Jul 4, 2021Updated 5 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
yohanliyanage / jenkins-spark-deploy
View on GitHub
A Jenkins plugin that allows to deploy / stop Apache Spark applications in Spark standalone clusters.
☆10Oct 25, 2015Updated 10 years ago
sparsecode / DaFlow
View on GitHub
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Jun 7, 2021Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ExpediaGroup / drone-fly
View on GitHub
A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service
☆13Jun 30, 2026Updated 3 weeks ago
techmonad / spark-data-pipeline
View on GitHub
This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago
kelindar / timeline
View on GitHub
Scheduler of events for near real-time systems
☆31Aug 21, 2025Updated 11 months ago
picnicml / doddle-model
View on GitHub
doddle-model: machine learning in Scala.
☆139Aug 13, 2024Updated last year
uber / uberscriptquery
View on GitHub
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
☆65Dec 17, 2023Updated 2 years ago
indix / sparkplug
View on GitHub
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆28May 15, 2020Updated 6 years ago
prajwalrao / ambari-metrics-grafana
View on GitHub
Ambari Metrics System Plugin for Grafana > v4.5.x
☆24Oct 2, 2018Updated 7 years ago