apache / arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
☆14,283Updated this week
Related projects: ⓘ
- DuckDB is an analytical in-process SQL database management system☆22,674Updated this week
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆36,304Updated this week
- Apache DataFusion SQL Query Engine☆5,913Updated this week
- Parallel computing with task scheduling☆12,405Updated this week
- The official home of the Presto distributed SQL query engine for big data☆15,919Updated this week
- Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)☆10,189Updated this week
- ClickHouse® is a real-time analytics DBMS☆36,728Updated this week
- Dataframes powered by a multithreaded, vectorized query engine, written in Rust☆29,261Updated this week
- The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.☆5,723Updated this week
- An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.☆17,461Updated this week
- A library that provides an embeddable, persistent key-value store for fast storage.☆28,269Updated this week
- Apache Druid: a high performance real-time analytics database.☆13,405Updated this week
- Apache Iceberg☆6,161Updated this week
- Apache Beam is a unified programming model for Batch and Streaming data processing.☆7,772Updated this week
- NoSQL data store using the seastar framework, compatible with Apache Cassandra☆13,236Updated this week
- A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.☆3,420Updated this week
- Apache Pinot - A realtime distributed OLAP datastore☆5,393Updated this week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆17,705Updated last week
- CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placemen…☆29,885Updated this week
- 𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://data…☆7,684Updated this week
- Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries f…☆33,048Updated this week
- Distributed transactional key-value database, originally created to complement TiDB☆15,023Updated this week
- An orchestration platform for the development, production, and observation of data assets.☆11,155Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆15,830Updated this week
- Apache Parquet Java☆2,558Updated 2 weeks ago
- Apache Spark - A unified analytics engine for large-scale data processing☆39,296Updated this week
- YugabyteDB - the cloud native distributed SQL database for mission-critical applications.☆8,870Updated this week
- Machine Learning Toolkit for Kubernetes☆14,193Updated this week
- Open source platform for the machine learning lifecycle☆18,340Updated this week
- Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!☆9,416Updated this week