pachyderm / pachyderm
Data-Centric Pipelines and Data Versioning
☆6,173Updated this week
Related projects ⓘ
Alternatives and complementary repositories for pachyderm
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆17,858Updated last month
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,567Updated this week
- Machine Learning Toolkit for Kubernetes☆14,351Updated 2 weeks ago
- High-Performance Serverless event and data processing platform☆5,306Updated this week
- 📚 Parameterize, execute, and analyze notebooks☆5,962Updated last month
- An open-source graph database☆14,853Updated 4 months ago
- Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems☆8,224Updated this week
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,377Updated this week
- Quilt is a data mesh for connecting people with actionable data☆1,328Updated this week
- M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform☆4,763Updated this week
- PipelineAI☆4,168Updated 6 months ago
- the portable Python dataframe library☆5,267Updated this week
- A next-generation curated knowledge sharing platform for data scientists and other technical professions.☆5,481Updated 2 months ago
- 🦉 Data Versioning and ML Experiments☆13,860Updated this week
- Production infrastructure for machine learning at scale☆8,020Updated 4 months ago
- A curated list of awesome ETL frameworks, libraries, and software.☆3,279Updated 3 months ago
- Parallel computing with task scheduling☆12,576Updated this week
- Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics☆14,529Updated this week
- Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.☆5,757Updated this week
- Always know what to expect from your data.☆9,970Updated this week
- A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin☆6,187Updated 3 weeks ago
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,700Updated 3 months ago
- Grumpy is a Python to Go source code transcompiler and runtime.☆10,547Updated 2 years ago
- 📘 The interactive computing suite for you! ✨☆6,205Updated 10 months ago
- Python Stream Processing☆6,743Updated 3 months ago
- Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logi…☆8,302Updated this week
- Visualizations for machine learning datasets☆7,355Updated last year
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,290Updated last month
- The Open Source Feature Store for Machine Learning☆5,592Updated this week
- Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…☆1,798Updated 11 months ago