apache / arrowLinks
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆15,929Updated this week
Alternatives and similar repositories for arrow
Users that are interested in arrow are comparing it to the libraries listed below
Sorting:
- DuckDB is an analytical in-process SQL database management system☆32,655Updated last week
- Apache DataFusion SQL Query Engine☆7,709Updated this week
- A composable and fully extensible C++ execution engine library for data management systems.☆3,883Updated this week
- the portable Python dataframe library☆6,094Updated this week
- Real-time Data Integration and Transformation: use SQL to transform, deliver, and act on fast-changing data.☆6,108Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆8,268Updated this week
- Apache Parquet Java☆2,935Updated last week
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, v…☆5,363Updated this week
- Apache Parquet Format☆2,032Updated 2 weeks ago
- 𝗔𝗜-𝗡𝗮𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲. Open-source Snowflake alternative. Proven at petabyte scale with enterprise performance. B…☆8,822Updated this week
- Parallel computing with task scheduling☆13,481Updated this week
- High-performance runtime for data analytics applications☆3,002Updated 3 years ago
- Apache Iceberg☆7,937Updated last week
- NoSQL data store using the Seastar framework, compatible with Apache Cassandra and Amazon DynamoDB☆14,847Updated last week
- Distributed transactional key-value database, originally created to complement TiDB☆16,121Updated this week
- Apache Beam is a unified programming model for Batch and Streaming data processing.☆8,291Updated this week
- Extremely fast Query Engine for DataFrames, written in Rust☆35,310Updated this week
- ClickHouse® is a real-time analytics database management system☆42,856Updated this week
- cuDF - GPU DataFrame Library☆9,170Updated this week
- The official home of the Presto distributed SQL query engine for big data☆16,498Updated this week
- Alluxio, data orchestration for analytics and machine learning in the cloud☆7,067Updated 4 months ago
- Apache Druid: a high performance real-time analytics database.☆13,821Updated this week
- Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.☆8,316Updated this week
- Apache Pinot - A realtime distributed OLAP datastore☆5,894Updated this week
- The Universal Storage Engine☆1,976Updated this week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆18,477Updated 3 months ago
- Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!☆10,985Updated this week
- A modular implementation of timely dataflow in Rust☆3,504Updated 2 weeks ago
- Official Rust implementation of Apache Arrow☆3,121Updated this week
- A library that provides an embeddable, persistent key-value store for fast storage.☆30,491Updated this week