eto-ai / rikai
Parquet-based ML data format optimized for working with unstructured data
☆140Updated 2 years ago
Alternatives and similar repositories for rikai:
Users that are interested in rikai are comparing it to the libraries listed below
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆210Updated this week
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆333Updated 2 weeks ago
- Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.☆97Updated 10 months ago
- Distributed SQL Query Engine in Python using Ray☆243Updated 7 months ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Updated 2 years ago
- Clink is a library that provides APIs and infrastructure to facilitate the development of parallelizable feature engineering operators th…☆29Updated 3 years ago
- Unified storage framework for the entire machine learning lifecycle☆156Updated last year
- Lightweight and Fast Feature Store Powered by Go (and Rust).☆89Updated 3 years ago
- ☆85Updated last week
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year
- Point-in-Time optimizations for Apache Spark☆30Updated last year
- ☆105Updated last year
- Liga: Let Data Dance with ML Models☆13Updated last year
- Ray-based Apache Beam runner☆42Updated last year
- An Extensible Data Skipping Framework☆46Updated 3 months ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated 4 months ago
- Apache datasketches☆95Updated 2 years ago
- Flow with FlorDB 🌻☆155Updated 2 weeks ago
- Ibis Substrait Compiler☆102Updated this week
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 5 months ago
- Apache Calcite Adapter for Apache Kudu☆28Updated 7 months ago
- Java binding to Apache DataFusion☆80Updated 3 weeks ago
- Visualize column-level data lineage in Spark SQL☆91Updated 2 years ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Updated 2 years ago
- The Internals of PySpark☆26Updated 4 months ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆39Updated 7 months ago
- Distributed SQL Engine in Python using Dask☆404Updated 8 months ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- Friendly ML feature store☆45Updated 2 years ago