I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that utilizes Kafka to scrape, process, and load data onto S3 in JSON format. With a producer-consumer architecture, I ensure that the data is in the right format for loading onto S3 by performing minor transformations
☆29May 2, 2023Updated 2 years ago
Alternatives and similar repositories for real-time_crypto_data_pipeline_using_kafka
Users that are interested in real-time_crypto_data_pipeline_using_kafka are comparing it to the libraries listed below
Sorting:
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- sql-for-data-engineering-course☆18May 12, 2023Updated 2 years ago
- ☆19May 27, 2023Updated 2 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆25May 6, 2023Updated 2 years ago
- YouTube tutorial project☆108Oct 17, 2023Updated 2 years ago
- R toolbox to explore the TRON blockchain☆10Jul 18, 2021Updated 4 years ago
- This project aims to build a traveling recommendation application using Google Places API and OpenAI LLM.☆11Mar 19, 2024Updated last year
- ☆16Feb 20, 2026Updated 2 weeks ago
- ☆11Jan 13, 2024Updated 2 years ago
- Analyse Spotify playlists, albums and artists.☆35Nov 15, 2022Updated 3 years ago
- A Gentle Introduction to RAG☆15Oct 8, 2024Updated last year
- ☆15Sep 7, 2025Updated 6 months ago
- full code written for the Twilio blog https://www.twilio.com/blog/media-file-storage-python-flask-amazon-s3-buckets☆11May 4, 2024Updated last year
- A fun little data analysis project to whether American prefers Mexican food over Italian food or Chinese Food.☆12Sep 11, 2017Updated 8 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆10Jun 6, 2021Updated 4 years ago
- ☆11Dec 9, 2020Updated 5 years ago
- Citibike Analysis for Data Science Certification☆10May 25, 2018Updated 7 years ago
- Capstone Project for the IBM Data Engineering Professional Certification.☆13Mar 7, 2022Updated 4 years ago
- The project focuses on the drowsiness of IT employees, drivers, pilots, crane operators, student etc. These people need a system which ca…☆14Sep 13, 2018Updated 7 years ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- R package for tracking Covid19 cases in San Francisco☆12Apr 2, 2023Updated 2 years ago
- Analyze coinbase orderbook in real-time in Python with Bytewax☆11Apr 23, 2024Updated last year
- ☆11Aug 11, 2022Updated 3 years ago
- ☆11Nov 9, 2022Updated 3 years ago
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆48Oct 27, 2025Updated 4 months ago
- ☆11Mar 31, 2025Updated 11 months ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆17Feb 19, 2023Updated 3 years ago
- Business challenge that requires building a data platform for retailer data analytics.☆18Feb 19, 2023Updated 3 years ago
- ☆10Mar 23, 2023Updated 2 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆11Jun 27, 2023Updated 2 years ago
- Kafka Connect: How to create a real time data pipeline using Change Data Capture (CDC)☆13Jan 24, 2021Updated 5 years ago
- ☆16Jun 11, 2018Updated 7 years ago
- R-package to interface with the PlantNet API☆12Mar 9, 2022Updated 4 years ago
- Google Advanced Data Analytics Coursera☆12Jul 2, 2023Updated 2 years ago
- Real-time sentiment analysis on tweets using tweepy and kafka. Graphed using the output of a neural network and Dash/Plotly.☆14Nov 3, 2020Updated 5 years ago
- This repository demonstrates how to leverage OpenAI's GPT-4 models with JSON Strict Mode to extract structured data from web pages. It c…☆20Aug 14, 2024Updated last year
- reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.☆14Jun 26, 2023Updated 2 years ago
- a simple machine learning pipeline built using Apache AirFlow☆15Nov 22, 2022Updated 3 years ago