I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that utilizes Kafka to scrape, process, and load data onto S3 in JSON format. With a producer-consumer architecture, I ensure that the data is in the right format for loading onto S3 by performing minor transformations
☆29May 2, 2023Updated 2 years ago
Alternatives and similar repositories for real-time_crypto_data_pipeline_using_kafka
Users that are interested in real-time_crypto_data_pipeline_using_kafka are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- sql-for-data-engineering-course☆18May 12, 2023Updated 2 years ago
- ☆145Jan 31, 2023Updated 3 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- Apartments Data Pipeline using Airflow and Spark.☆23Mar 28, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Collection of my favorite Python packages from 2020☆11Jan 12, 2021Updated 5 years ago
- Data pipeline that scrapes Rust cheater Steam profiles☆54Feb 13, 2022Updated 4 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆51Aug 23, 2019Updated 6 years ago
- YouTube tutorial project☆108Oct 17, 2023Updated 2 years ago
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆49Mar 13, 2026Updated 3 weeks ago
- Analyse Spotify playlists, albums and artists.☆35Nov 15, 2022Updated 3 years ago
- A simple cli tool that deletes files matching an extension within a given directory structure.☆12Sep 27, 2023Updated 2 years ago
- ☆16May 29, 2023Updated 2 years ago
- RedditR for Content Engagement and Recommendation☆18Dec 21, 2017Updated 8 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Capstone Project for the IBM Data Engineering Professional Certification.☆13Mar 7, 2022Updated 4 years ago
- Monoscope's Golang client SDK.☆20Mar 1, 2026Updated last month
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- A highly scalable microservice to handle WhatsApp, SMS and email-based notifications.☆20Mar 29, 2021Updated 5 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆22Jan 28, 2018Updated 8 years ago
- An interactive platform that enables individuals to share their unique experiences across various stages of life and emotional journeys. …☆14Feb 29, 2024Updated 2 years ago
- Stream/batch system with Hadoop, Spark on NYC taxi data | #DE☆26Sep 27, 2025Updated 6 months ago
- Winners solutions for [WNS Analytics Wizard 2018](https://datahack.analyticsvidhya.com/contest/wns-analytics-hackathon-2018/)☆25Dec 13, 2018Updated 7 years ago
- ☆170May 20, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This project aims to build a traveling recommendation application using Google Places API and OpenAI LLM.☆11Mar 19, 2024Updated 2 years ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆25Aug 11, 2023Updated 2 years ago
- This repo contains all iNeuron Full Stack Data Science Assignments☆12Jun 6, 2023Updated 2 years ago
- Simple Python wrapper for ABBYY Cloud OCR☆16Feb 5, 2024Updated 2 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Jan 7, 2023Updated 3 years ago
- Project on belief embedding☆21Jun 4, 2025Updated 10 months ago
- ☆17Feb 9, 2023Updated 3 years ago
- Deployed on expo go☆23May 22, 2022Updated 3 years ago
- The repository for the CMU Data Pipeline course. This year's course should use branch 2017☆40May 2, 2017Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- *****PROJECT SPECIFICATION: Machine Learning Capstone Analysis Project***** This capstone project involves machine learning modeling and…☆15Mar 28, 2018Updated 8 years ago
- A simple to use python script for Automatic License Plate Recognition using Google Cloud Vision API.☆15Apr 29, 2018Updated 7 years ago
- In this web scraping project, my goal is to extract real-time stock market data from the renowned Yahoo Finance website. By leveraging we…☆13Jun 12, 2023Updated 2 years ago
- This repository is created to improve SQL skill. The very basic requirement/skill needed for Data Analyst is SQL, this 30-day SQL Questio…☆20Aug 30, 2023Updated 2 years ago
- A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.☆21Oct 15, 2024Updated last year
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- The project focuses on the drowsiness of IT employees, drivers, pilots, crane operators, student etc. These people need a system which ca…☆14Sep 13, 2018Updated 7 years ago