Spark integrations for working with Lance datasets
☆46Mar 16, 2026Updated this week
Alternatives and similar repositories for lance-spark
Users that are interested in lance-spark are comparing it to the libraries listed below
Sorting:
- Lance Namespace is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse☆52Updated this week
- Community Java bindings for https://github.com/facebookincubator/velox☆41Updated this week
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆16Jan 4, 2026Updated 2 months ago
- A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.☆31Feb 5, 2026Updated last month
- Run Graph Queries with Lance☆132Mar 5, 2026Updated 2 weeks ago
- Hive for MR3☆38Mar 9, 2026Updated last week
- ☆13Jun 10, 2024Updated last year
- Alluxio Python client - Access Any Data Source with Python☆31Sep 29, 2025Updated 5 months ago
- Apache Hive Metastore in Standalone Mode With Docker☆14Jul 22, 2024Updated last year
- Olympia is a storage-only open catalog format for big data analytics, ML & AI.☆16May 5, 2025Updated 10 months ago
- Apache DataFusion Ray☆228Oct 5, 2025Updated 5 months ago
- HashCats Auto Clicker is a versatile tool that enhances your gaming experience by automating various actions within the HashCats game☆18Updated this week
- A collection of RBIR projects and posts for anyone interested in joining this journey.☆317Updated this week
- Apache OpenDAL Go Binding Services Releases☆15Sep 11, 2025Updated 6 months ago
- Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.☆1,224Updated this week
- 同步数据的小工具☆17Feb 27, 2026Updated 3 weeks ago
- ☆100Updated this week
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆122Updated this week
- The home of Floecat: A catalog of catalogs for open table formats☆58Updated this week
- World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.☆2,924Updated this week
- ☆16Aug 13, 2014Updated 11 years ago
- Flink Agents is an Agentic AI framework based on Apache Flink☆326Mar 13, 2026Updated last week
- Persistent data structures - immutable copy-on-write lists, maps and sets for Java☆11Feb 14, 2021Updated 5 years ago
- Idempotent query executor☆52Apr 28, 2025Updated 10 months ago
- Radio is a DuckDB extension by Query.Farm that brings real-time event streams into your SQL workflows. It enables DuckDB to receive and s…☆36Feb 18, 2026Updated last month
- ☆49Feb 14, 2022Updated 4 years ago
- Monitoring and insights on your data lakehouse tables☆32Mar 6, 2026Updated 2 weeks ago
- Sandboxing C in Rust☆19Jun 16, 2025Updated 9 months ago
- Apache Paimon Rust The rust implementation of Apache Paimon.☆150Updated this week
- A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset☆20Sep 29, 2025Updated 5 months ago
- A playground to experience Gravitino☆73Mar 16, 2026Updated last week
- attempt to create a library of code snippets I use a lot☆18Oct 3, 2014Updated 11 years ago
- a tailored Apache Calcite for Apache Kylin, more details at http://mail-archives.apache.org/mod_mbox/kylin-dev/201704.mbox/%3CCAF7etT=wEB…☆14Nov 7, 2025Updated 4 months ago
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆446Updated this week
- Tasks API for Stateful Functions on Flink☆13Feb 28, 2026Updated 3 weeks ago
- bash script to find and execute java classes with main methods☆19Oct 24, 2025Updated 4 months ago
- Write property based tests easily on spark dataframes☆20Jan 19, 2024Updated 2 years ago
- This project demonstrates Real-Time streaming of CDC data from MySql to Apache Iceberg using Flink SQL Client for faster data analytics a…☆23Jan 16, 2024Updated 2 years ago