Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as a set of dimensional tables. Lake Processing: Spark, Lake Storage: S3
☆17Oct 1, 2019Updated 6 years ago
Alternatives and similar repositories for udacity-data-eng-proj4
Users that are interested in udacity-data-eng-proj4 are comparing it to the libraries listed below
Sorting:
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Nov 22, 2021Updated 4 years ago
- Neo4j 3.x accessed via bolt JS driver, plugged into D3 v4 force simulation☆18Apr 2, 2017Updated 8 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- Data models, build data warehouses and data lakes, automate data pipelines, and worked with massive datasets.☆13Jul 16, 2019Updated 6 years ago
- Repo which holds the materials for the EMR Zero To Hero☆27May 7, 2022Updated 3 years ago
- Udacity Data Streaming Nanodegree Program☆24Feb 20, 2021Updated 5 years ago
- Build war with maven and sparkjava framework☆26Jul 17, 2024Updated last year
- Example Python and R code for Cloudera Machine Learning (CML) training☆14Dec 1, 2020Updated 5 years ago
- A tool for translating Scala source code into readable and maintainable Java code☆13Jan 3, 2026Updated 2 months ago
- Aplicación hecha en Node.js y Vue.js.☆15Jan 6, 2025Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- A batch-processing system base on Spring Boot and Spring Batch. 一个基于SpringBoot和SpringBatch的批处理系统。☆10Sep 10, 2018Updated 7 years ago
- Simple python script that converts all Excel files (xls, xlsx, xlsm, csv) in a directory into xlsb files.☆10Mar 13, 2023Updated 2 years ago
- Framework for studying cryptographic hash functions using SAT.☆10Dec 21, 2021Updated 4 years ago
- Final Project for Data Engineering Zoomcamp Course 2024 🧙🔥☆11Apr 17, 2024Updated last year
- exemplar code to download all option chains for a symbol using pyetrade (V1 Etrade API)☆10Sep 28, 2021Updated 4 years ago
- Python library for the simulation of probabilistic circuits.☆11Feb 1, 2026Updated last month
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆32Aug 14, 2023Updated 2 years ago
- ☆11Sep 1, 2022Updated 3 years ago
- 🍕🍔🍟 Delimenú es una aplicación web para que los restaurantes puedan digitalizar sus menús y de esta manera sus usuarios puedan sentirs…☆10Nov 19, 2024Updated last year
- A primer on using the 'synthpop' package for the biobehavioral sciences☆11Mar 31, 2020Updated 5 years ago
- This project contains a sample Progressive Web App (PWA) that connects to Aras Innovator via RESTful API and OAuth authentication.☆10Oct 23, 2020Updated 5 years ago
- CSC 424 Advanced Database Management Systems☆16Jan 1, 2020Updated 6 years ago
- zdh系列-基于java的经营风控引擎☆13Jan 24, 2026Updated last month
- Spark projects. Learning book "Machine Learning with Spark"☆10Jun 3, 2017Updated 8 years ago
- 📑 A minimalist Android todo app based on Clean Architecture with MVP (for the presentation layer) using architecture components library …☆10Apr 22, 2020Updated 5 years ago
- A Complete Code for JWT Authentication/Role Based Authorization in GraphQL☆12Feb 8, 2019Updated 7 years ago
- Realtime IoT data streaming from Smartphone sensors☆11Aug 26, 2020Updated 5 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆90Jul 17, 2019Updated 6 years ago
- 使用shell脚本部署Apache Doris (incubating) FE & BE☆11Jul 8, 2019Updated 6 years ago
- 用户画像代码,根据算法推算出用户的性别和年龄比率☆11Dec 18, 2017Updated 8 years ago
- Devery Protocol Smart Contracts☆12Dec 22, 2018Updated 7 years ago
- Python and Scala APIs for enhanced Spark analytics☆12Mar 15, 2017Updated 8 years ago
- ☆11Dec 11, 2022Updated 3 years ago
- 基于hanlp工具包的es分词插件☆10Mar 20, 2018Updated 7 years ago
- a Java port of C# PasswordDeriveBytes class☆13Jun 20, 2017Updated 8 years ago