A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and Delta Lake.
☆29Aug 8, 2020Updated 5 years ago
Alternatives and similar repositories for spark-twitter-streaming
Users that are interested in spark-twitter-streaming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆51Aug 23, 2019Updated 6 years ago
- Code snippets and tools published on the blog at lifearounddata.com☆12Jan 19, 2020Updated 6 years ago
- ☆22Jan 21, 2026Updated 2 months ago
- A simple demo showing how to use Ably and fastAPI to route messages into Kafka for stream processing☆16Oct 12, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- A consumer of a Kafka topic based on Flink☆12Oct 5, 2022Updated 3 years ago
- Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.☆16Sep 18, 2021Updated 4 years ago
- ☆15Jul 31, 2022Updated 3 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated last year
- 🚀 Complete AWS learning path for beginners - 45K+ community resource with hands-on labs, workshops, and certification guides☆18Feb 18, 2026Updated last month
- Django Based Hotel Management App☆15Nov 22, 2022Updated 3 years ago
- ☆11Sep 30, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Feb 15, 2023Updated 3 years ago
- This provider contains operators, decorators and triggers to send a ray job from an airflow task☆24Oct 27, 2025Updated 5 months ago
- A fully functional Twitter Clone builded with Django.☆18Mar 19, 2021Updated 5 years ago
- SQL Server 2017 Integration Services Cookbook, published by Packt☆17Jan 30, 2023Updated 3 years ago
- A pipeline to CI/CD of a machine learning model on Google Cloud Run☆32May 1, 2023Updated 2 years ago
- Community Themes☆27Jan 3, 2019Updated 7 years ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆22Dec 4, 2017Updated 8 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29May 2, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Tweepy Stream Example☆19Apr 23, 2019Updated 6 years ago
- Delta-Lake, ETL, Spark, Airflow☆48Oct 9, 2022Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Udacity data engineering nanodegree project☆22Nov 8, 2019Updated 6 years ago
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- Data Engineering with AWS Cookbook, published by Packt☆24Dec 1, 2024Updated last year
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Dec 3, 2020Updated 5 years ago
- Vocabulary objects for natural language processing☆14Jun 1, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆25May 6, 2023Updated 2 years ago
- This repository contains all tutorials for Apache Spark, Delta Lake, Koalas, MLflow, and other.☆16May 29, 2020Updated 5 years ago
- K8s infrastructure repository☆11Updated this week
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆23Nov 19, 2024Updated last year
- This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and …☆18Jun 6, 2020Updated 5 years ago
- ☆16Dec 13, 2020Updated 5 years ago
- Simple log parsing example in Python☆14Oct 7, 2015Updated 10 years ago