Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
☆242Sep 12, 2022Updated 3 years ago
Alternatives and similar repositories for data-engineering
Users that are interested in data-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Learn how to create reliable ML systems by testing code, data and models.☆93Sep 12, 2022Updated 3 years ago
- Using a feature store to connect the DataOps and MLOps workflows to enable collaborative teams to develop efficiently.☆60Sep 12, 2022Updated 3 years ago
- Learn how to monitor ML systems to identify and mitigate sources of drift before model performance decay.☆103Sep 12, 2022Updated 3 years ago
- Learn how to design, develop, deploy and iterate on production-grade ML applications.☆3,333Aug 16, 2024Updated last year
- Data Engineer Roadmaps as Projects Funnel☆11Aug 10, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is a repository of scripts developed as part of the 2020 ENCMP100 Section B3 lecture taught at University of Alberta.☆10Apr 2, 2020Updated 6 years ago
- learning-by-doing data model built with dbt-core☆17Apr 10, 2026Updated last week
- YATO: Yet Another deep learning based Text analysis Open toolkit☆47Oct 11, 2023Updated 2 years ago
- Data Engineering Bootcamp 2021☆13Aug 8, 2023Updated 2 years ago
- An MLflow Provider Package for Apache Airflow☆26Oct 22, 2025Updated 5 months ago
- 📒(GitBook) A curated list of awesome Data Engineering resources☆39Aug 27, 2025Updated 7 months ago
- A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation an…☆23Nov 21, 2023Updated 2 years ago
- This repository hosts materials for the Docker for Data Engineers workshop, offering hands-on exercises and resources tailored for data e…☆17May 23, 2024Updated last year
- Practical Data Engineering: A Hands-On Real-Estate Project Guide☆799Mar 10, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆15Jul 1, 2022Updated 3 years ago
- ☆30Jul 29, 2023Updated 2 years ago
- Easily import a module and mock its dependencies in an isolated way.☆13May 19, 2022Updated 3 years ago
- Dynamic batching for Document Layout and OCR, suitable for RAG, with extra tools.☆14Nov 25, 2024Updated last year
- Training HuggingFace models using fastai☆11Jul 22, 2021Updated 4 years ago
- ☆24May 11, 2025Updated 11 months ago
- ☆15Apr 1, 2024Updated 2 years ago
- 💯 OSS version of Deepchecks' monitoring platform, synced from https://github.com/deepchecks/monitoring☆16Jun 3, 2025Updated 10 months ago
- Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.☆15May 3, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQ…☆19Mar 25, 2026Updated 3 weeks ago
- Fine-tune an LLM to perform batch inference and online serving.☆120May 29, 2025Updated 10 months ago
- Repository for Spark using Python material. It is popularly known as PySpark.☆20Aug 18, 2021Updated 4 years ago
- Retrieval Augmented Generation applications☆26Oct 17, 2023Updated 2 years ago
- Multimodal retrieval in art with context embeddings.☆11Jan 5, 2022Updated 4 years ago
- Keyword Extraction and Analysis Pipeline & Application with KeyBERT and Taipy☆16Apr 18, 2023Updated 3 years ago
- This is a record of all my coding practice including Data Manipulation, Data Structure and Algorithm, Data Visualization.☆13Apr 7, 2020Updated 6 years ago
- This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeath…☆22Feb 3, 2026Updated 2 months ago
- This course is designed to provide learners with the fundamental skills needed for data engineering using Python. The objective is to int…☆27Aug 15, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The best place to learn data engineering. Built and maintained by the data engineering community.☆1,920Apr 6, 2026Updated last week
- Optimizing Hyperparameters with Conformal Quantile Regression☆10May 22, 2023Updated 2 years ago
- Open Source Annotation Tools for Computer Vision and NLP tasks☆57Aug 4, 2021Updated 4 years ago
- Set up your Ubuntu system with essential and fun packages☆22May 18, 2024Updated last year
- An Awesome List of Open-Source Data Engineering Projects☆3,143Oct 4, 2024Updated last year
- Collection of Python utility scripts & OOP basic demo | #SE☆14Jan 8, 2025Updated last year
- ☆10Jun 24, 2021Updated 4 years ago