facebookincubator/nimble

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookincubator/nimble)

facebookincubator / nimble

New and extensible file format for storage of large columnar datasets.

☆728

Alternatives and similar repositories for nimble

Users that are interested in nimble are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vortex-data / vortex
View on GitHub
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now…
☆3,096Updated this week
facebookincubator / velox
View on GitHub
A composable and fully extensible C++ execution engine library for data management systems.
☆4,178Updated this week
cwida / FastLanes
View on GitHub
Next-Gen Big Data File Format
☆687Apr 22, 2026Updated 3 months ago
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,850Updated this week
substrait-io / substrait
View on GitHub
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
☆1,536Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
apache / datafusion-comet
View on GitHub
Apache DataFusion Comet Spark Accelerator
☆1,233Updated this week
maxi-k / btrblocks
View on GitHub
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
☆285Apr 7, 2025Updated last year
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,577Updated this week
GlareDB / glaredb
View on GitHub
GlareDB: A light and fast SQL database for analytics
☆1,019Nov 14, 2025Updated 8 months ago
apache / datafusion
View on GitHub
Apache DataFusion SQL Query Engine
☆9,014Updated this week
facebookincubator / axiom
View on GitHub
Axiom is a set of reusable and extensible components designed to be compatible with Velox. Its primary purpose is to simplify the process…
☆79Updated this week
durner / AnyBlob
View on GitHub
AnyBlob - A Universal Cloud Object Storage Download Manager Built For Cost-Throughput Optimal Analytics!
☆153Jul 9, 2026Updated 2 weeks ago
slatedb / slatedb
View on GitHub
A cloud native embedded storage engine built on object storage.
☆3,225Updated this week
XiangpengHao / liquid-cache
View on GitHub
Pushdown cache for DataFusion
☆417Jun 13, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
sirius-db / sirius
View on GitHub
GPU-native composable analytics engine
☆1,024Updated this week
apache / datafusion-ballista
View on GitHub
Apache DataFusion Ballista Distributed Query Engine
☆2,095Updated this week
spiraldb / fastlanes
View on GitHub
Rust implementation of the FastLanes compression library
☆183Updated this week
apache / datafusion-ray
View on GitHub
Apache DataFusion Ray
☆230May 15, 2026Updated 2 months ago
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,780Updated this week
apache / arrow-adbc
View on GitHub
Database connectivity API standard and libraries for Apache Arrow
☆615Updated this week
Eventual-Inc / Daft
View on GitHub
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
☆5,660Updated this week
tonbo-io / tonbo
View on GitHub
Tonbo is an embedded database for serverless and edge runtimes.
☆1,591Jul 18, 2026Updated last week
lakekeeper / lakekeeper
View on GitHub
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
☆1,399Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
apache / incubator-xtable
View on GitHub
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processin…
☆1,196Updated this week
apache / iceberg-rust
View on GitHub
Apache Iceberg
☆1,353Updated this week
awslabs / analytics-accelerator-s3
View on GitHub
Analytics Accelerator Library for Amazon S3 is an open source library that accelerates data access from client applications to Amazon S3.
☆71Jul 9, 2026Updated 2 weeks ago
apache / hudi-rs
View on GitHub
The native Rust implementation for Apache Hudi, with C++ & Python API bindings.
☆278Jun 26, 2026Updated last month
foyer-rs / foyer
View on GitHub
Hybrid in-memory and disk cache in Rust
☆1,778Jul 11, 2026Updated 2 weeks ago
apache / polaris
View on GitHub
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
☆2,023Updated this week
spiraldb / fsst
View on GitHub
Pure-Rust implementation of Fast Static Symbol Tables string compression
☆225Updated this week
future-file-format / F3
View on GitHub
[SIGMOD 2026] F3: The Open-Source Data File Format for the Future
☆746Nov 3, 2025Updated 8 months ago
cmu-db / optd-original
View on GitHub
CMU-DB's Cascades optimizer framework
☆405Jan 6, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
unitycatalog / unitycatalog-rs
View on GitHub
Open, Multi-modal Catalog for Data & AI, written in Rust
☆86Sep 30, 2024Updated last year
duckdblabs / duckdb-substrait-demo
View on GitHub
☆17Jan 17, 2023Updated 3 years ago
datafusion-contrib / datafusion-distributed
View on GitHub
Library for bringing distributed capabilities to Apache DataFusion
☆123Updated this week
clflushopt / tpchgen-rs
View on GitHub
TPC-H benchmark data generation in pure Rust
☆250Updated this week
projectnessie / nessie
View on GitHub
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
☆1,483Updated this week
feldera / feldera
View on GitHub
The Feldera Incremental Computation Engine
☆2,007Updated this week
arrow-udf / arrow-udf
View on GitHub
A User-Defined Function Framework for Apache Arrow.
☆113Updated this week