Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
☆62Sep 6, 2024Updated last year
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below
Sorting:
- ☆14Feb 10, 2026Updated last month
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- Maven plugin for generating Scala case classes and ADTs from Apache Avro schemas, datafiles, and protocols☆10Sep 7, 2023Updated 2 years ago
- Dockerfiles maintained by Trivadis Platform Factory☆12Mar 13, 2020Updated 6 years ago
- A K8s-based infrastructure for analytics☆24Jan 15, 2020Updated 6 years ago
- ☆32Mar 21, 2018Updated 8 years ago
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- Sbt thin client in Scala.js running on Node☆14Oct 27, 2018Updated 7 years ago
- ☆10Sep 17, 2020Updated 5 years ago
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 6 months ago
- Exploration of Convolutional Neural Networks using DeepLearning4J and Scala for Kaggle competition on Yelp Photo Classification☆13Nov 3, 2016Updated 9 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 8 months ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of o…☆11Feb 4, 2016Updated 10 years ago
- Telco traffic simulator built with Scala, Akka and Play☆15Mar 24, 2023Updated 2 years ago
- ☆11Aug 14, 2014Updated 11 years ago
- Dione - a Spark and HDFS indexing library☆52Oct 27, 2025Updated 4 months ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- The Ninja Converter☆13Nov 16, 2024Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Sep 4, 2023Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- Apache Spark OpenCPU Executor (ROSE)☆26Jun 16, 2018Updated 7 years ago
- Data quality control tool built on spark and deequ☆25Mar 3, 2026Updated 2 weeks ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆455Feb 8, 2026Updated last month
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 4 years ago
- Experiments with symbolic functions in the Scala type system☆27Jun 17, 2019Updated 6 years ago
- Kafka as a Datalog Engine☆28Mar 31, 2025Updated 11 months ago
- Model complex data transformation pipelines easily☆44Sep 23, 2022Updated 3 years ago
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Reduce memory usage by running multiple applications in the same JVM.☆13Jul 11, 2019Updated 6 years ago
- Spark to Tableau Extractor library☆19Oct 23, 2017Updated 8 years ago
- Tapestry CSRF Protection☆11Sep 23, 2025Updated 5 months ago
- My journey to learn Scala.☆49Apr 21, 2019Updated 6 years ago
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago
- Writing application logic for Spark jobs that can be unit-tested without a SparkContext☆76Jan 27, 2019Updated 7 years ago
- Egeria's Guidance on Governance as well as large media files such as presentations and movies☆107Oct 20, 2022Updated 3 years ago
- NLP Utilities in Java☆43Dec 14, 2022Updated 3 years ago