Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
☆62Sep 6, 2024Updated last year
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Feb 10, 2026Updated 2 months ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Apr 24, 2024Updated last year
- Maven plugin for generating Scala case classes and ADTs from Apache Avro schemas, datafiles, and protocols☆10Sep 7, 2023Updated 2 years ago
- A K8s-based infrastructure for analytics☆24Jan 15, 2020Updated 6 years ago
- ☆32Mar 21, 2018Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Scripts to build a Docker image with Apache Impala with Kudu support (no HDFS needed)☆16Nov 17, 2020Updated 5 years ago
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- Sbt thin client in Scala.js running on Node☆14Oct 27, 2018Updated 7 years ago
- ☆10Sep 17, 2020Updated 5 years ago
- Akka plugin to collect various data about actors☆17Aug 19, 2024Updated last year
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 6 months ago
- Exploration of Convolutional Neural Networks using DeepLearning4J and Scala for Kaggle competition on Yelp Photo Classification☆13Nov 3, 2016Updated 9 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 8 months ago
- ☆23Jun 14, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of o…☆11Feb 4, 2016Updated 10 years ago
- Telco traffic simulator built with Scala, Akka and Play☆15Mar 24, 2023Updated 3 years ago
- ☆11Aug 14, 2014Updated 11 years ago
- Big Data Toolkit for the JVM☆148Nov 4, 2020Updated 5 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- The Ninja Converter☆13Nov 16, 2024Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Mar 23, 2026Updated 2 weeks ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆62Sep 4, 2023Updated 2 years ago
- Is there a picture with wrong orientation, or just displayed too small? Rotate or zoom images directly on any website, just one in the co…☆17Mar 31, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- Apache Spark OpenCPU Executor (ROSE)☆25Jun 16, 2018Updated 7 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆455Apr 2, 2026Updated last week
- Utility for benchmarking changes in Spark using TPC-DS workloads☆16Jun 3, 2021Updated 4 years ago
- Experiments with symbolic functions in the Scala type system☆27Jun 17, 2019Updated 6 years ago
- Kafka as a Datalog Engine☆28Mar 31, 2025Updated last year
- Model complex data transformation pipelines easily☆44Sep 23, 2022Updated 3 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Reduce memory usage by running multiple applications in the same JVM.☆13Jul 11, 2019Updated 6 years ago
- Query LDAP and AD with SQL☆10Jun 17, 2021Updated 4 years ago
- Spark to Tableau Extractor library☆19Oct 23, 2017Updated 8 years ago
- Tapestry CSRF Protection☆11Sep 23, 2025Updated 6 months ago
- My journey to learn Scala.☆49Apr 21, 2019Updated 6 years ago
- sbt plugin to roll the Git history☆132Dec 17, 2021Updated 4 years ago
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago