pditommaso / awesome-pipeline
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
☆6,190Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-pipeline
- A curated list of awesome ETL frameworks, libraries, and software.☆3,280Updated 3 months ago
- Curated list of resources about Apache Airflow☆3,683Updated 2 months ago
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆17,858Updated last month
- Repository for the CWL standards. Use https://cwl.discourse.group/ for support 😊☆1,454Updated 2 months ago
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow☆2,079Updated 10 months ago
- the portable Python dataframe library☆5,281Updated this week
- Always know what to expect from your data.☆9,970Updated this week
- Docker Apache Airflow☆3,775Updated last year
- 📚 Parameterize, execute, and analyze notebooks☆5,962Updated last month
- An orchestration platform for the development, production, and observation of data assets.☆11,649Updated this week
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,290Updated last month
- A next-generation curated knowledge sharing platform for data scientists and other technical professions.☆5,481Updated 2 months ago
- ETL best practices with airflow, with examples☆1,293Updated last month
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆37,024Updated this week
- Data-Centric Pipelines and Data Versioning☆6,173Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,434Updated last week
- a curated list of awesome streaming frameworks, applications, etc☆2,694Updated 2 months ago
- Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems☆8,226Updated this week
- Actively curated list of awesome BI tools. PRs welcome!☆2,089Updated 2 months ago
- Parallel computing with task scheduling☆12,576Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆9,947Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,307Updated last month
- A curated list of data engineering tools for software developers☆6,791Updated 2 weeks ago
- Dynamically generate Apache Airflow DAGs from YAML configuration files☆1,197Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆16,456Updated this week
- Quickly and accurately render even the largest data.☆3,320Updated this week
- A series of DAGs/Workflows to help maintain the operation of Airflow☆1,680Updated 4 months ago
- A light-weight, flexible, and expressive statistical data testing library☆3,370Updated last week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆7,578Updated this week