TileDB-Inc / TileDB-SparkLinks
Spark interface to the TileDB storage manager [please see README]
☆17Updated last year
Alternatives and similar repositories for TileDB-Spark
Users that are interested in TileDB-Spark are comparing it to the libraries listed below
Sorting:
- Provides GPU awareness to Spark, Contact: @kmadhugit and @kiszk☆172Updated 7 years ago
- Java read and write example for Apache Arrow☆34Updated 8 years ago
- Enabling queries on compressed data.☆282Updated 2 years ago
- Drizzle integration with Apache Spark☆120Updated 7 years ago
- JVM integration for Weld☆16Updated 7 years ago
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆362Updated last week
- ☆108Updated 2 years ago
- Vectorized processing for Apache Arrow☆485Updated 3 years ago
- Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing☆253Updated 5 years ago
- A library for exporting Spark ML models and pipelines to PFA☆54Updated 7 years ago
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Mirror of Apache DataFu☆121Updated 8 months ago
- Parameter Server implementation in Apache Flink☆56Updated 7 years ago
- Spark Shuffle Optimization with RDMA+AEP☆30Updated 2 years ago
- An open-source, vendor-neutral data context service.☆161Updated 7 years ago
- An experimental Graph Streaming API for Apache Flink☆141Updated 5 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Updated last year
- A tool and library for easily deploying applications on Apache YARN☆146Updated last year
- Common library for serving TensorFlow, XGBoost and scikit-learn models in production.☆143Updated 2 years ago
- Website for DataSketches.☆108Updated 2 weeks ago
- BlinkDB: Sub-Second Approximate Queries on Very Large Data.☆659Updated 12 years ago
- FlorDB 🌻☆158Updated 3 months ago
- Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.☆320Updated 2 weeks ago
- ☆108Updated 3 years ago
- Parquet-based ML data format optimized for working with unstructured data☆141Updated 3 years ago
- Self regulation and auto-tuning for distributed system☆67Updated 2 years ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆258Updated 2 years ago
- Parameter Server implementation in Apache Flink.☆14Updated 7 years ago
- Cache File System optimized for columnar formats and object stores☆187Updated 3 years ago
- Use the TPC-DS benchmark to test Spark SQL performance☆184Updated 5 years ago