Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
☆63Sep 6, 2024Updated last year
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Feb 10, 2026Updated 4 months ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated 2 years ago
- Maven plugin for generating Scala case classes and ADTs from Apache Avro schemas, datafiles, and protocols☆10Sep 7, 2023Updated 2 years ago
- Dockerfiles maintained by Trivadis Platform Factory☆12Mar 13, 2020Updated 6 years ago
- ☆32Mar 21, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Scripts to build a Docker image with Apache Impala with Kudu support (no HDFS needed)☆16Nov 17, 2020Updated 5 years ago
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- Sbt thin client in Scala.js running on Node☆14Oct 27, 2018Updated 7 years ago
- Atlas custom type definitions☆17Jun 23, 2021Updated 5 years ago
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- Essential Spark extensions and helper methods ✨😲☆767Jun 22, 2026Updated last week
- Extensible streaming ingestion pipeline on top of Apache Spark☆47Jul 17, 2025Updated 11 months ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of o…☆11Feb 4, 2016Updated 10 years ago
- Telco traffic simulator built with Scala, Akka and Play☆15Mar 24, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Aug 14, 2014Updated 11 years ago
- Dione - a Spark and HDFS indexing library☆53Mar 26, 2026Updated 3 months ago
- Big Data Toolkit for the JVM☆147Nov 4, 2020Updated 5 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Mar 23, 2026Updated 3 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆63Sep 4, 2023Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆588Jan 24, 2024Updated 2 years ago
- Apache Spark OpenCPU Executor (ROSE)☆25Jun 16, 2018Updated 8 years ago
- Data quality control tool built on spark and deequ☆25May 9, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- File and folder naming convention checker written in rust☆21May 28, 2019Updated 7 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆458Apr 2, 2026Updated 2 months ago
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 5 years ago
- Experiments with symbolic functions in the Scala type system☆27Jun 17, 2019Updated 7 years ago
- Kafka as a Datalog Engine☆28Mar 31, 2025Updated last year
- Model complex data transformation pipelines easily☆43Sep 23, 2022Updated 3 years ago
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- something to help you spark☆65Oct 23, 2018Updated 7 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last month
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Reduce memory usage by running multiple applications in the same JVM.☆13Jul 11, 2019Updated 6 years ago
- Spark to Tableau Extractor library☆19Oct 23, 2017Updated 8 years ago
- My journey to learn Scala.☆49Apr 21, 2019Updated 7 years ago
- sbt plugin to roll the Git history☆132Dec 17, 2021Updated 4 years ago
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago
- Writing application logic for Spark jobs that can be unit-tested without a SparkContext☆76Jan 27, 2019Updated 7 years ago
- Enhanced TestNG integration with Xray Test Management for Jira☆10Jan 27, 2026Updated 5 months ago