Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
☆63Sep 6, 2024Updated last year
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Feb 10, 2026Updated 2 months ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated 2 years ago
- Maven plugin for generating Scala case classes and ADTs from Apache Avro schemas, datafiles, and protocols☆10Sep 7, 2023Updated 2 years ago
- Dockerfiles maintained by Trivadis Platform Factory☆12Mar 13, 2020Updated 6 years ago
- ☆32Mar 21, 2018Updated 8 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Sbt thin client in Scala.js running on Node☆14Oct 27, 2018Updated 7 years ago
- ☆10Sep 17, 2020Updated 5 years ago
- Atlas custom type definitions☆16Jun 23, 2021Updated 4 years ago
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 7 months ago
- Exploration of Convolutional Neural Networks using DeepLearning4J and Scala for Kaggle competition on Yelp Photo Classification☆13Nov 3, 2016Updated 9 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 9 months ago
- ☆23Jun 14, 2021Updated 4 years ago
- Telco traffic simulator built with Scala, Akka and Play☆15Mar 24, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Aug 14, 2014Updated 11 years ago
- Dione - a Spark and HDFS indexing library☆53Mar 26, 2026Updated last month
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- The Ninja Converter☆13Nov 16, 2024Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Mar 23, 2026Updated last month
- A library that brings useful functions from various modern database management systems to Apache Spark☆62Sep 4, 2023Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆588Jan 24, 2024Updated 2 years ago
- Apache Spark OpenCPU Executor (ROSE)☆25Jun 16, 2018Updated 7 years ago
- Data quality control tool built on spark and deequ☆25Apr 10, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- File and folder naming convention checker written in rust☆21May 28, 2019Updated 6 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆455Apr 2, 2026Updated 3 weeks ago
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 4 years ago
- Experiments with symbolic functions in the Scala type system☆27Jun 17, 2019Updated 6 years ago
- Kafka as a Datalog Engine☆28Mar 31, 2025Updated last year
- Model complex data transformation pipelines easily☆44Sep 23, 2022Updated 3 years ago
- Scalalaz podcast website generator☆10Dec 25, 2024Updated last year
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30Apr 15, 2026Updated 2 weeks ago
- Reduce memory usage by running multiple applications in the same JVM.☆13Jul 11, 2019Updated 6 years ago
- Query LDAP and AD with SQL☆10Jun 17, 2021Updated 4 years ago
- Tapestry CSRF Protection☆11Sep 23, 2025Updated 7 months ago
- sbt plugin to roll the Git history☆132Dec 17, 2021Updated 4 years ago
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago
- Writing application logic for Spark jobs that can be unit-tested without a SparkContext☆76Jan 27, 2019Updated 7 years ago