☆32Aug 13, 2018Updated 7 years ago
Alternatives and similar repositories for data-engineering
Users that are interested in data-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Jul 13, 2020Updated 5 years ago
- ☆25Jun 11, 2020Updated 5 years ago
- ☆19Dec 16, 2021Updated 4 years ago
- Example repo to create end to end tests for data pipeline.☆25Jun 14, 2024Updated last year
- How to build an awesome data engineering team☆101Sep 11, 2019Updated 6 years ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!☆26Nov 8, 2022Updated 3 years ago
- Udacity Data Engineering Nanodegree Capstone Project☆37May 9, 2020Updated 5 years ago
- JS30 ++☆10Jun 29, 2020Updated 5 years ago
- ☆28Nov 10, 2021Updated 4 years ago
- Introduction to MLflow and Using MLflow with an Anaconda Environment☆11Sep 17, 2020Updated 5 years ago
- Implementation of an ETL process for real-time sentiment analysis of tweets with Docker, Apache Kafka, Spark Streaming, MongoDB and Delta…☆19May 6, 2023Updated 2 years ago
- A ready to use template for the CRISP-DM data science workflow☆13Nov 14, 2025Updated 4 months ago
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- Scripts and code written whilst learning and experimenting with machine learning☆13Jul 18, 2022Updated 3 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- ☆40Nov 2, 2021Updated 4 years ago
- Distributed Data Systems with Azure Databricks, published by Packt☆12Jan 18, 2023Updated 3 years ago
- An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…☆25Dec 7, 2022Updated 3 years ago
- 🐋 Docker image for AWS Glue Spark/Python☆23Sep 5, 2023Updated 2 years ago
- ☆13Oct 6, 2019Updated 6 years ago
- Google FSI Accelerator Pattern☆13Jun 18, 2024Updated last year
- To collect and promote FOSS projects started by and contributed to by Vietnamese☆12Sep 24, 2018Updated 7 years ago
- This repository contains all the resources and solution to quizzes given and asked in IBM Data Science Professional Certification.☆13Sep 16, 2022Updated 3 years ago
- ☆13May 1, 2020Updated 5 years ago
- This sample shows how to create two Azure Container Apps that use OpenAI, LangChain, ChromaDB, and Chainlit using Terraform.☆12May 7, 2024Updated last year
- Data Engineering Capstone☆17Oct 10, 2019Updated 6 years ago
- ☆20Nov 2, 2018Updated 7 years ago
- Command line tools for verifying Arduino sketches, uploading them to boards, validating AUnit unit tests, and integrating with continuous…☆21Jun 26, 2023Updated 2 years ago
- To share with friends☆12Sep 2, 2016Updated 9 years ago
- A web app that reads a list of urls and displays their current status.☆12Dec 4, 2024Updated last year
- Full stack data engineering tools and infrastructure set-up☆57Feb 13, 2021Updated 5 years ago
- Face login using face recognition by Open CV Python☆14Aug 6, 2019Updated 6 years ago
- ☆12Oct 15, 2023Updated 2 years ago
- Spark Standalone & Livy☆11Jul 13, 2021Updated 4 years ago
- Basic Spark examples.☆11Jan 12, 2021Updated 5 years ago
- This is the code repo for the O'Reilly book "Data Science: The Hard Parts"☆18Jun 2, 2024Updated last year
- Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution☆67Aug 5, 2020Updated 5 years ago
- ☆13Jan 11, 2024Updated 2 years ago