A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and Delta Lake.
☆29Aug 8, 2020Updated 5 years ago
Alternatives and similar repositories for spark-twitter-streaming
Users that are interested in spark-twitter-streaming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…☆10Jul 12, 2021Updated 4 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆51Aug 23, 2019Updated 6 years ago
- Code snippets and tools published on the blog at lifearounddata.com☆12Jan 19, 2020Updated 6 years ago
- Spark data pipeline that processes movie ratings data.☆31May 1, 2026Updated 3 weeks ago
- ☆27Jan 21, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 8 years ago
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.☆16Sep 18, 2021Updated 4 years ago
- Social Media Analysis, scalable solution, flexible deployment that analyses social media contents☆10Jul 20, 2023Updated 2 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆13Mar 28, 2017Updated 9 years ago
- Sample Python scripts to help get started with the Twitter Enterprise APIs☆23Feb 8, 2023Updated 3 years ago
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated 2 years ago
- 🚀 Complete AWS learning path for beginners - 45K+ community resource with hands-on labs, workshops, and certification guides☆18Apr 28, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Django Based Hotel Management App☆15Nov 22, 2022Updated 3 years ago
- SQL Server 2017 Integration Services Cookbook, published by Packt☆17Jan 30, 2023Updated 3 years ago
- A* is a computer algorithm that is widely used in pathfinding and graph traversal, which is the process of finding a path between multipl…☆10Apr 29, 2019Updated 7 years ago
- Community Themes☆27Jan 3, 2019Updated 7 years ago
- Repository for code examples from my youtube channel and medium articles working with data in python on AWS☆29Feb 5, 2024Updated 2 years ago
- ☆10Nov 28, 2020Updated 5 years ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆22Dec 4, 2017Updated 8 years ago
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29May 2, 2023Updated 3 years ago
- Simple, easy to use django-based point of sale system☆15Jan 8, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- Architecture of Streaming Twitter Data into Apache Kafka cluster, performing simple sentiment analysis with afinn module, storing the dat…☆20Jan 3, 2020Updated 6 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆49Oct 9, 2022Updated 3 years ago
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 3 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Dec 3, 2020Updated 5 years ago
- Data Engineering with AWS Cookbook, published by Packt☆26Apr 13, 2026Updated last month
- Processing TfL data for bike usage with Google Cloud Platform.☆45Jul 15, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repo demonstrates how to use AWS application auto-scaling to implement custom-scaling in your Kinesis Data Analytics for Apache Flin…☆19Feb 21, 2025Updated last year
- This repository contains all tutorials for Apache Spark, Delta Lake, Koalas, MLflow, and other.☆16May 29, 2020Updated 6 years ago
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆23Nov 19, 2024Updated last year
- This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and …☆18Jun 6, 2020Updated 5 years ago
- ☆16Dec 13, 2020Updated 5 years ago
- Simple log parsing example in Python☆14Oct 7, 2015Updated 10 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆48Dec 11, 2023Updated 2 years ago