eto-ai / rikaiLinks
Parquet-based ML data format optimized for working with unstructured data
☆140Updated 2 years ago
Alternatives and similar repositories for rikai
Users that are interested in rikai are comparing it to the libraries listed below
Sorting:
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆341Updated last week
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆230Updated last week
- Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.☆98Updated last year
- Point-in-Time optimizations for Apache Spark☆30Updated last year
- Distributed SQL Query Engine in Python using Ray☆243Updated 9 months ago
- General Metadata Architecture☆127Updated this week
- Build reliable AI and agentic applications with DataFrames☆130Updated this week
- ☆106Updated 2 years ago
- Ray-based Apache Beam runner☆42Updated last year
- This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python. PyJava introduces Apache A…☆48Updated 2 years ago
- The Internals of PySpark☆26Updated 6 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆227Updated last week
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated 3 weeks ago
- Lightweight and Fast Feature Store Powered by Go (and Rust).☆90Updated 3 years ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Updated 2 years ago
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful …☆143Updated 11 months ago
- The Internals of Delta Lake☆184Updated 6 months ago
- Friendly ML feature store☆45Updated 3 years ago
- Liga: Let Data Dance with ML Models☆13Updated last year
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆37Updated 4 years ago
- Visualize column-level data lineage in Spark SQL☆92Updated 3 years ago
- FeatHub - A stream-batch unified feature store for real-time machine learning☆335Updated last year
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year
- Apache Calcite Adapter for Apache Kudu☆28Updated 9 months ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 2 weeks ago
- AI Flow is an open source framework that bridges big data and artificial intelligence.☆176Updated 2 years ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆231Updated 5 months ago
- Clink is a library that provides APIs and infrastructure to facilitate the development of parallelizable feature engineering operators th…☆29Updated 3 years ago
- Instant access to the Spark cluster from anywhere☆16Updated 4 years ago
- ☆70Updated 6 months ago