Spark integrations for working with Lance datasets
☆56Jun 26, 2026Updated this week
Alternatives and similar repositories for lance-spark
Users that are interested in lance-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lance Namespace is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse☆55Jun 23, 2026Updated last week
- Community Java bindings for https://github.com/facebookincubator/velox☆43Updated this week
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆16May 22, 2026Updated last month
- Testing Sandbox for Hadoop Ecosystem Components☆45Jun 16, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Run Graph Queries with Lance☆158Jun 21, 2026Updated last week
- Hive for MR3☆39Jun 22, 2026Updated last week
- ☆13Jun 10, 2024Updated 2 years ago
- Alluxio Python client - Access Any Data Source with Python☆31Sep 29, 2025Updated 9 months ago
- Apache Hive Metastore in Standalone Mode With Docker☆14Jul 22, 2024Updated last year
- Olympia is a storage-only open catalog format for big data analytics, ML & AI.☆16May 5, 2025Updated last year
- Apache DataFusion Ray☆231May 15, 2026Updated last month
- A collection of RBIR projects and posts for anyone interested in joining this journey.☆325Updated this week
- Apache OpenDAL Go Binding Services Releases☆16Jun 1, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- OSPP 2022 Project: String Adaptive Hash Table for Databend☆19Sep 15, 2022Updated 3 years ago
- 同步数据的小工具☆19Feb 27, 2026Updated 4 months ago
- ☆100Jun 23, 2026Updated last week
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆126Updated this week
- World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.☆3,031Updated this week
- Persistent data structures - immutable copy-on-write lists, maps and sets for Java☆11Feb 14, 2021Updated 5 years ago
- The home of Floecat: A catalog of catalogs for open table formats☆86Updated this week
- A cli for spinning up and managing Ray clusters for the Daft Query Engine.☆15Feb 15, 2025Updated last year
- Idempotent query executor☆53Apr 28, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Apache Iceberg Documentation Site☆42Feb 5, 2024Updated 2 years ago
- Radio is a DuckDB extension by Query.Farm that brings real-time event streams into your SQL workflows. It enables DuckDB to receive and s…☆42Mar 29, 2026Updated 3 months ago
- ☆49Feb 14, 2022Updated 4 years ago
- Sandboxing C in Rust☆19Jun 16, 2025Updated last year
- A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset☆26Sep 29, 2025Updated 9 months ago
- A playground to experience Gravitino☆79May 15, 2026Updated last month
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆449May 27, 2026Updated last month
- bash script to find and execute java classes with main methods☆20Oct 24, 2025Updated 8 months ago
- Write property based tests easily on spark dataframes☆21Jan 19, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Unity Catalog Explorer is a TypeScript + Next.js based Web UI for the Unity Catalog OSS.☆13Jun 29, 2024Updated 2 years ago
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,575Updated this week
- An Extensible Data Skipping Framework☆50Jul 15, 2025Updated 11 months ago
- Java implementation for performing operations on Apache Iceberg and Hive tables☆21May 25, 2026Updated last month
- OCRA: Object-store Cache in Rust for All☆18Sep 29, 2025Updated 9 months ago
- An exploratory visualization tool for the analysis of movements between geographic locations☆13Dec 9, 2022Updated 3 years ago
- Cookbook recipes to get up and running with Spice.ai quickly 🚀☆28Jun 16, 2026Updated 2 weeks ago