I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that utilizes Kafka to scrape, process, and load data onto S3 in JSON format. With a producer-consumer architecture, I ensure that the data is in the right format for loading onto S3 by performing minor transformations
☆29May 2, 2023Updated 2 years ago
Alternatives and similar repositories for real-time_crypto_data_pipeline_using_kafka
Users that are interested in real-time_crypto_data_pipeline_using_kafka are comparing it to the libraries listed below
Sorting:
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- sql-for-data-engineering-course☆18May 12, 2023Updated 2 years ago
- ☆19May 27, 2023Updated 2 years ago
- The official home for the GRID DataJam 2023☆21Sep 4, 2023Updated 2 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- ☆146Jan 31, 2023Updated 3 years ago
- ☆22Apr 13, 2023Updated 2 years ago
- Apartments Data Pipeline using Airflow and Spark.☆23Mar 28, 2022Updated 3 years ago
- YouTube tutorial project☆108Oct 17, 2023Updated 2 years ago
- This project aims to build a traveling recommendation application using Google Places API and OpenAI LLM.☆11Mar 19, 2024Updated last year
- Combining The Power Of Machine Learning And Automation , We Made A Tool To By Pass Captcha Of www.geca.ac.in 's MIS Login using Machine L…☆10Aug 19, 2022Updated 3 years ago
- ☆16Feb 20, 2026Updated 2 weeks ago
- full code written for the Twilio blog https://www.twilio.com/blog/media-file-storage-python-flask-amazon-s3-buckets☆11May 4, 2024Updated last year
- My solutions for the Udacity Data Engineering Nanodegree☆34Oct 14, 2019Updated 6 years ago
- A Gentle Introduction to RAG☆15Oct 8, 2024Updated last year
- R package for tracking Covid19 cases in San Francisco☆12Apr 2, 2023Updated 2 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆10Jun 6, 2021Updated 4 years ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- ☆11Nov 9, 2022Updated 3 years ago
- Citibike Analysis for Data Science Certification☆10May 25, 2018Updated 7 years ago
- The repository includes detailed steps to get data from GES DISC, convert HDF5 files to CSV and plotting geographic data.☆11Aug 17, 2020Updated 5 years ago
- Acquiring and processing information on world's largest banks☆18Jun 17, 2025Updated 8 months ago
- ☆13Sep 9, 2024Updated last year
- ☆11Dec 9, 2020Updated 5 years ago
- Solved data engineering exercises using Pyspark☆15Aug 2, 2021Updated 4 years ago
- Project on belief embedding☆20Jun 4, 2025Updated 9 months ago
- ☆11Aug 11, 2022Updated 3 years ago
- ☆16Jun 11, 2018Updated 7 years ago
- ☆14May 14, 2024Updated last year
- Real-time sentiment analysis on tweets using tweepy and kafka. Graphed using the output of a neural network and Dash/Plotly.☆14Nov 3, 2020Updated 5 years ago
- ☆11Mar 31, 2025Updated 11 months ago
- ☆17Apr 29, 2025Updated 10 months ago
- Google Advanced Data Analytics Coursera☆12Jul 2, 2023Updated 2 years ago
- ☆10Mar 23, 2023Updated 2 years ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆17Feb 19, 2023Updated 3 years ago
- Business challenge that requires building a data platform for retailer data analytics.☆18Feb 19, 2023Updated 3 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆11Jun 27, 2023Updated 2 years ago
- R-package to interface with the PlantNet API☆12Mar 9, 2022Updated 4 years ago
- This repo contains all iNeuron Full Stack Data Science Assignments☆12Jun 6, 2023Updated 2 years ago