datamindedbe/lighthouse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/datamindedbe/lighthouse)

datamindedbe / lighthouse

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

☆64

Alternatives and similar repositories for lighthouse

Users that are interested in lighthouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

timvw / arrow-flightsql-odbc
View on GitHub
☆14Feb 10, 2026Updated 5 months ago
CoxAutomotiveDataSolutions / waimak
View on GitHub
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Apr 24, 2024Updated 2 years ago
makubi / avrohugger-maven-plugin
View on GitHub
Maven plugin for generating Scala case classes and ADTs from Apache Avro schemas, datafiles, and protocols
☆10Sep 7, 2023Updated 2 years ago
TrivadisPF / dockerfiles
View on GitHub
Dockerfiles maintained by Trivadis Platform Factory
☆12Mar 13, 2020Updated 6 years ago
godatadriven / scala-spark-application
View on GitHub
☆32Mar 21, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hunters-ai / spark-adaptive-file-connector
View on GitHub
Adaptive File Source Connector for Spark, optimised for reading from object stores
☆15Oct 18, 2022Updated 3 years ago
dwarszawski / amundsen-atlas-types
View on GitHub
Atlas custom type definitions
☆17Jun 23, 2021Updated 5 years ago
andreaTP / sbtcli
View on GitHub
Sbt thin client in Scala.js running on Node
☆14Oct 27, 2018Updated 7 years ago
brooksandrew / kaggle_yelp
View on GitHub
Exploration of Convolutional Neural Networks using DeepLearning4J and Scala for Kaggle competition on Yelp Photo Classification
☆12Nov 3, 2016Updated 9 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
ScalaConsultants / akka-periscope
View on GitHub
Akka plugin to collect various data about actors
☆17Aug 19, 2024Updated last year
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated last year
botkop / botkop-telcotraffic-simulator
View on GitHub
Telco traffic simulator built with Scala, Akka and Play
☆15Mar 24, 2023Updated 3 years ago
sodadata / soda-spark
View on GitHub
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Mar 23, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
51zero / eel-sdk
View on GitHub
Big Data Toolkit for the JVM
☆147Nov 4, 2020Updated 5 years ago
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
porscheinformatik / tapestry-csrf-protection
View on GitHub
Tapestry CSRF Protection
☆11Sep 23, 2025Updated 9 months ago
and-rej / rotate-and-zoom-image
View on GitHub
Is there a picture with wrong orientation, or just displayed too small? Rotate or zoom images directly on any website, just one in the co…
☆17Mar 31, 2022Updated 4 years ago
sodadata / soda-streaming
View on GitHub
☆23Jun 14, 2021Updated 5 years ago
yaooqinn / itachi
View on GitHub
A library that brings useful functions from various modern database management systems to Apache Spark
☆63Sep 4, 2023Updated 2 years ago
tdas / spark-streaming-benchmark
View on GitHub
☆11Aug 14, 2014Updated 11 years ago
YotpoLtd / metorikku
View on GitHub
A simplified, lightweight ETL Framework based on Apache Spark
☆588Jan 24, 2024Updated 2 years ago
YotpoLtd / cADR
View on GitHub
🤖 AI-powered ADR generation - Automatically capture and document architectural decisions as you code
☆15Apr 16, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
adelbertc / sabre
View on GitHub
In-memory distributed graph processing of trivially parallelizable graph algorithms.
☆22Apr 17, 2013Updated 13 years ago
iboss-ptk / ficon
View on GitHub
File and folder naming convention checker written in rust
☆21May 28, 2019Updated 7 years ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
onetapbeyond / opencpu-spark-executor
View on GitHub
Apache Spark OpenCPU Executor (ROSE)
☆25Jun 16, 2018Updated 8 years ago
timgent / data-flare
View on GitHub
Data quality control tool built on spark and deequ
☆25May 9, 2026Updated 2 months ago
itkpi / trembita
View on GitHub
Model complex data transformation pipelines easily
☆43Sep 23, 2022Updated 3 years ago
scalalaz-podcast / scalalaz-gen
View on GitHub
Scalalaz podcast website generator
☆10Dec 25, 2024Updated last year
dataengi / crm-seed
View on GitHub
Scala CRM Seed
☆15Feb 9, 2018Updated 8 years ago
rayokota / kdatalog
View on GitHub
Kafka as a Datalog Engine
☆28Mar 31, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
palantir / spark-tpcds-benchmark
View on GitHub
Utility for benchmarking changes in Spark using TPC-DS workloads
☆16Jun 3, 2021Updated 5 years ago
mohnoor94 / LearningScala
View on GitHub
My journey to learn Scala.
☆49Apr 21, 2019Updated 7 years ago
jeremyrsmith / baudrillard
View on GitHub
Experiments with symbolic functions in the Scala type system
☆27Jun 17, 2019Updated 7 years ago
devmindset / sparkscalainterview
View on GitHub
Contain Interview Questions Solutions
☆12May 18, 2018Updated 8 years ago
radanalyticsio / silex
View on GitHub
something to help you spark
☆65Oct 23, 2018Updated 7 years ago
fretn / sqldap
View on GitHub
Query LDAP and AD with SQL
☆10Jun 17, 2021Updated 5 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago