Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
☆62Sep 6, 2024Updated last year
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below
Sorting:
- ☆14Feb 10, 2026Updated 2 weeks ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Sep 4, 2023Updated 2 years ago
- A K8s-based infrastructure for analytics☆24Jan 15, 2020Updated 6 years ago
- ☆11Aug 14, 2014Updated 11 years ago
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 5 months ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of o…☆11Feb 4, 2016Updated 10 years ago
- ☆10Sep 17, 2020Updated 5 years ago
- Dockerfiles maintained by Trivadis Platform Factory☆12Mar 13, 2020Updated 5 years ago
- Data quality control tool built on spark and deequ☆25Jan 22, 2026Updated last month
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago
- Sbt thin client in Scala.js running on Node☆14Oct 27, 2018Updated 7 years ago
- Reduce memory usage by running multiple applications in the same JVM.☆13Jul 11, 2019Updated 6 years ago
- Big Data Toolkit for the JVM☆146Nov 4, 2020Updated 5 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Apache Spark OpenCPU Executor (ROSE)☆26Jun 16, 2018Updated 7 years ago
- Model complex data transformation pipelines easily☆45Sep 23, 2022Updated 3 years ago
- Telco traffic simulator built with Scala, Akka and Play☆15Mar 24, 2023Updated 2 years ago
- Big Data Science Swiss Army Knife - http://www.tuktu.io --☆60Feb 15, 2018Updated 8 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆454Feb 8, 2026Updated 3 weeks ago
- Spark NLP for Streamlit☆15Sep 12, 2021Updated 4 years ago
- ☆32Mar 21, 2018Updated 7 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- Egeria's Guidance on Governance as well as large media files such as presentations and movies☆107Oct 20, 2022Updated 3 years ago
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- Data-Driven Spark allows quick data exploration based on Apache Spark.☆29Jan 6, 2017Updated 9 years ago
- Writing application logic for Spark jobs that can be unit-tested without a SparkContext☆76Jan 27, 2019Updated 7 years ago
- Data monitoring tool, monitors the result, not the run☆16Dec 16, 2021Updated 4 years ago
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 4 years ago
- Data quality tools for Big Data☆19Oct 10, 2019Updated 6 years ago
- Apache Zeppelin Service for Apache Ambari Service. Installation and management of Zeppelin via Ambari.☆14Jan 23, 2016Updated 10 years ago
- Spark package for checking data quality☆223Feb 28, 2020Updated 6 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- Distributed solver library for large-scale structured output prediction, based on Spark. Project website:☆17Mar 3, 2016Updated 9 years ago
- Graph algorithms implemented in GraphX and Spark styles☆15Apr 26, 2015Updated 10 years ago