apache/datafusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apache/datafusion)

apache / datafusion

Apache DataFusion SQL Query Engine

☆8,978

Alternatives and similar repositories for datafusion

Users that are interested in datafusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / arrow-rs
View on GitHub
Official Rust implementation of Apache Arrow
☆3,526Updated this week
apache / datafusion-ballista
View on GitHub
Apache DataFusion Ballista Distributed Query Engine
☆2,083Updated this week
databendlabs / databend
View on GitHub
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.
☆9,388Updated this week
risingwavelabs / risingwave
View on GitHub
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
☆9,167Updated this week
apache / datafusion-comet
View on GitHub
Apache DataFusion Comet Spark Accelerator
☆1,226Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,931Updated this week
delta-io / delta-rs
View on GitHub
A native Rust library for Delta Lake, with bindings into Python
☆3,259Updated this week
apache / iceberg-rust
View on GitHub
Apache Iceberg
☆1,340Updated this week
ArroyoSystems / arroyo
View on GitHub
Distributed stream processing engine in Rust
☆4,961Updated this week
apache / datafusion-sqlparser-rs
View on GitHub
Extensible SQL Lexer and Parser for Rust
☆3,406Updated this week
facebookincubator / velox
View on GitHub
A composable and fully extensible C++ execution engine library for data management systems.
☆4,171Updated this week
MaterializeInc / materialize
View on GitHub
The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL
☆6,334Updated this week
pola-rs / polars
View on GitHub
Extremely fast Query Engine for DataFrames, written in Rust
☆39,021Updated this week
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,776Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
apache / opendal
View on GitHub
Apache OpenDAL: One Layer, All Storage.
☆5,237Updated this week
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,793Updated this week
duckdb / duckdb
View on GitHub
DuckDB is an analytical in-process SQL database management system
☆39,427Updated this week
substrait-io / substrait
View on GitHub
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
☆1,534Updated this week
jorgecarleitao / arrow2
View on GitHub
Transmute-free Rust library to work with the Arrow format
☆1,064Feb 27, 2024Updated 2 years ago
GreptimeTeam / greptimedb
View on GitHub
The open-source Observability 2.0 database. One engine for metrics, logs, and traces — replacing Prometheus, Loki & ES.
☆6,471Updated this week
TimelyDataflow / timely-dataflow
View on GitHub
A modular implementation of timely dataflow in Rust
☆3,627Updated this week
quickwit-oss / tantivy
View on GitHub
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
☆15,551Updated this week
apache / iceberg
View on GitHub
Apache Iceberg
☆9,047Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
apache / horaedb
View on GitHub
Apache HoraeDB (incubating) is a high-performance, distributed, cloud native time-series database.
☆2,831Feb 5, 2026Updated 5 months ago
apache / datafusion-python
View on GitHub
Apache DataFusion Python Bindings
☆593Updated this week
tikv / tikv
View on GitHub
Distributed transactional key-value database, originally created to complement TiDB
☆16,759Updated this week
vectordotdev / vector
View on GitHub
A high-performance observability data pipeline.
☆22,184Updated this week
quickwit-oss / quickwit
View on GitHub
Cloud-native OSS search engine for observability
☆11,410Updated this week
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
vortex-data / vortex
View on GitHub
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now…
☆3,088Updated this week
risinglightdb / risinglight
View on GitHub
An educational OLAP database system.
☆1,836Aug 10, 2025Updated 11 months ago
Eventual-Inc / Daft
View on GitHub
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
☆5,627Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
fluvio-community / fluvio
View on GitHub
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
☆5,241Jul 6, 2026Updated last week
slatedb / slatedb
View on GitHub
A cloud native embedded storage engine built on object storage.
☆3,207Updated this week
tensorbase / tensorbase
View on GitHub
TensorBase is a new big data warehousing with modern efforts.
☆1,459May 10, 2022Updated 4 years ago
erikgrinaker / toydb
View on GitHub
Distributed SQL database in Rust, written as an educational project
☆7,261Jun 14, 2026Updated last month
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,906Updated this week
trinodb / trino
View on GitHub
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
☆13,025Updated this week
spacejam / sled
View on GitHub
the champagne of beta embedded databases
☆9,045Apr 4, 2026Updated 3 months ago