lancedb / lance-sparkLinks

Spark integrations for working with Lance datasets

☆28

Alternatives and similar repositories for lance-spark

Users that are interested in lance-spark are comparing it to the libraries listed below

Sorting:

boostscale / velox4j
Community Java bindings for https://github.com/facebookincubator/velox
☆35Updated this week
lancedb / lance-namespace
Lance Namespace is an open specification on top of the storage-based Lance table and file format to standardize access to a collection of…
☆34Updated last week
oap-project / Gluten-Trino
Gluten: Plugin to Boost Trino's Performance
☆76Updated 2 years ago
apache / iceberg-docs
Apache Iceberg Documentation Site
☆42Updated last year
oap-project / gazelle_plugin
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆257Updated 2 years ago
substrait-io / substrait-java
☆92Updated this week
apache / uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
☆429Updated this week
apache / paimon-trino
Trino Connector for Apache Paimon.
☆38Updated 3 months ago
apache / kyuubi-client
Client libraries of end users of Apache Kyuubi
☆11Updated 2 years ago
ververica / ForSt
A Persistent Key-Value Store designed for Streaming processing
☆112Updated 7 months ago
linkedin / transport
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆302Updated this week
xskipper-io / xskipper
An Extensible Data Skipping Framework
☆47Updated 3 months ago
flink-extended / flink-remote-shuffle
Remote Shuffle Service for Flink
☆190Updated 2 years ago
onehouseinc / LakeView
Monitoring and insights on your data lakehouse tables
☆32Updated 2 weeks ago
nexmark / nexmark
Benchmarks for queries over continuous data streams.
☆364Updated 10 months ago
oap-project / velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
☆30Updated this week
uber / RemoteShuffleService
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆336Updated 2 years ago
awesome-kyuubi / hadoop-testing
Testing Sandbox for Hadoop Ecosystem Components
☆37Updated last month
lhbench / lhbench
Lakehouse storage system benchmark
☆76Updated 2 years ago
maropu / spark-sql-flow-plugin
Visualize column-level data lineage in Spark SQL
☆92Updated 3 years ago
CoxAutomotiveDataSolutions / spark-distcp
A re-implementation of Hadoop DistCP in Apache Spark
☆47Updated last year
linkedin / coral
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
☆864Updated 2 weeks ago
zabetak / calcite-tutorial
☆49Updated 3 years ago
apache / flink-benchmarks
Benchmarks for Apache Flink
☆180Updated 4 months ago
trinodb / benchto
Framework for running macro benchmarks in a clustered environment
☆36Updated 7 months ago
ClickHouse / spark-clickhouse-connector
Spark ClickHouse Connector build on DataSourceV2 API
☆209Updated last week
ververica / frocksdb
☆66Updated last year
apache / incubator-wayang
Apache Wayang(incubating) is the first cross-platform data processing system.
☆234Updated 2 weeks ago
apache / flink-connector-mongodb
Apache flink
☆49Updated 3 months ago
getindata / flink-http-connector
Http Connector for Apache Flink. Provides sources and sinks for Datastream , Table and SQL APIs.
☆192Updated last week