zetaris / lightning-catalogLinks
The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, data warehousing, lake houses, and ML projects.
☆34Updated 3 months ago
Alternatives and similar repositories for lightning-catalog
Users that are interested in lightning-catalog are comparing it to the libraries listed below
Sorting:
- Spark Connector to read and write with Pulsar☆116Updated 2 months ago
- Visualize column-level data lineage in Spark SQL☆92Updated 3 years ago
- HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)☆63Updated 2 months ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Updated 2 years ago
- Apache DataLab (incubating)☆152Updated 2 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94Updated 6 months ago
- Monitoring and insights on your data lakehouse tables☆33Updated last week
- LinkedIn's version of Apache Calcite☆23Updated 4 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆97Updated last week
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆46Updated last year
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated last week
- Developing Spark External Data Sources using the V2 API☆48Updated 7 years ago
- PostgreSQL and GreenPlum Data Source for Apache Spark☆35Updated 4 months ago
- ☆40Updated 2 years ago
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Updated 7 months ago
- Flink Controller implements a Kubernetes Custom Controller (aka Kubernetes Operator) for Apache Flink☆53Updated last month
- ☆94Updated this week
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆48Updated 2 months ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Schema Registry integration for Apache Spark☆40Updated 3 years ago
- pulsar lakehouse connector☆35Updated 8 months ago
- REST API for Apache Spark on K8S or YARN☆108Updated this week
- Storage connector for Trino☆116Updated this week
- 已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.☆57Updated 4 years ago
- Spark Structured Streaming State Tools☆34Updated 5 years ago
- Tutorial on how to setup Trino and Apache Ranger using docker☆41Updated last year
- ☆107Updated 2 years ago
- Port of TPC-DS dsdgen to Java☆50Updated last year
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆134Updated 2 years ago
- Hive for MR3☆37Updated 3 weeks ago