treeverse / lakeFS
lakeFS - Data version control for your data lake | Git for data
☆4,459Updated this week
Related projects ⓘ
Alternatives and complementary repositories for lakeFS
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆7,608Updated this week
- Apache DataFusion SQL Query Engine☆6,312Updated this week
- Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.☆5,787Updated this week
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, v…☆3,964Updated this week
- The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.☆5,809Updated this week
- The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lak…☆16,219Updated this week
- Compare tables within or across databases☆2,945Updated 6 months ago
- Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time E…☆7,052Updated this week
- 𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://data…☆7,867Updated this week
- Nessie: Transactional Catalog for Data Lakes with Git-like semantics☆1,040Updated this week
- A native Rust library for Delta Lake, with bindings into Python☆2,325Updated this week
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,166Updated 2 weeks ago
- Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes.☆2,847Updated last month
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆1,913Updated last week
- Malloy is an experimental language for describing data relationships and transformations.☆1,996Updated this week
- The open source high performance ELT framework powered by Apache Arrow☆5,878Updated this week
- Apache DataFusion Ballista Distributed Query Engine☆1,549Updated this week
- An Open Standard for lineage metadata collection☆1,772Updated this week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wr…☆1,851Updated this week
- An orchestration platform for the development, production, and observation of data assets.☆11,711Updated this week
- Apache Iceberg☆6,473Updated this week
- Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.☆1,956Updated this week
- An open protocol for secure data sharing☆770Updated last week
- Postgres with GPUs for ML/AI apps.☆6,038Updated last week
- re_data - fix data issues before your users & CEO would discover them 😊☆1,552Updated 6 months ago
- First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business…☆1,215Updated last month
- Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with struc…☆11,533Updated this week
- Python Stream Processing☆1,565Updated this week
- A GPU-powered real-time analytics storage and query engine.☆3,032Updated 4 months ago
- Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeli…☆4,118Updated this week