treeverse / lakeFS
lakeFS - Data version control for your data lake | Git for data
โ4,518Updated this week
Alternatives and similar repositories for lakeFS:
Users that are interested in lakeFS are comparing it to the libraries listed below
- The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.โ5,859Updated this week
- ๐๐ฎ๐๐ฎ, ๐๐ป๐ฎ๐น๐๐๐ถ๐ฐ๐ & ๐๐. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://dataโฆโ8,095Updated this week
- Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.โ2,005Updated this week
- Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes.โ2,888Updated 3 months ago
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vโฆโ4,119Updated this week
- Nessie: Transactional Catalog for Data Lakes with Git-like semanticsโ1,101Updated this week
- Hopsworks - Data-Intensive AI platform with a Feature Storeโ1,188Updated 2 months ago
- Apache Icebergโ6,767Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingโฆโ4,474Updated last week
- The Open Source Feature Store for Machine Learningโ5,722Updated this week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wrโฆโ1,903Updated this week
- Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.โ1,810Updated this week
- Collect, aggregate, and visualize a data ecosystem's metadataโ1,819Updated this week
- A GPU-powered real-time analytics storage and query engine.โ3,042Updated 6 months ago
- Apache Pinot - A realtime distributed OLAP datastoreโ5,598Updated this week
- Distributed data engine for Python/SQL designed for the cloud, powered by Rustโ2,476Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewโฆโ2,032Updated 3 months ago
- The Universal Storage Engineโ1,887Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trโฆโ7,755Updated this week
- Dremio - the missing link in modern dataโ1,401Updated 2 months ago
- Apache DataFusion SQL Query Engineโ6,628Updated this week
- re_data - fix data issues before your users & CEO would discover them ๐โ1,563Updated 8 months ago
- Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!โ9,859Updated this week
- Efficient data transformation and modeling framework that is backwards compatible with dbt.โ1,961Updated this week
- BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.โ1,941Updated 2 years ago
- An Open Standard for lineage metadata collectionโ1,818Updated this week
- Malloy is an experimental language for describing data relationships and transformations.โ2,027Updated this week
- A native Rust library for Delta Lake, with bindings into Pythonโ2,483Updated this week
- Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.โ5,924Updated this week
- Build data pipelines, the easy way ๐ ๏ธโ4,099Updated last year