Spark data pipeline that processes movie ratings data.
☆31Mar 1, 2026Updated 3 weeks ago
Alternatives and similar repositories for spark-movies-etl
Users that are interested in spark-movies-etl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆13Mar 1, 2026Updated 3 weeks ago
- Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…☆10Jul 12, 2021Updated 4 years ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆25Aug 11, 2023Updated 2 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- End-to-end ELT data engineering project☆22Dec 24, 2022Updated 3 years ago
- NoSQL extract, transform, load (ETL) toolkit with Python☆15Updated this week
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Dec 3, 2020Updated 5 years ago
- Boilerplate for PySpark on Cloud Kubernetes☆33Oct 12, 2021Updated 4 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆17Mar 31, 2024Updated last year
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- ☆18Aug 6, 2024Updated last year
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated last year
- Surface crack images classification using PyTorch Lightning☆11Jun 17, 2020Updated 5 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- ☆10May 16, 2022Updated 3 years ago
- a list of links to help you make various important architectural decisions☆11Jul 13, 2016Updated 9 years ago
- A React component to implement continuous scrolling (for modern browser).☆17Jan 12, 2017Updated 9 years ago
- Cool DE Projects☆68Mar 22, 2026Updated last week
- A pipeline to CI/CD of a machine learning model on Google Cloud Run☆32May 1, 2023Updated 2 years ago
- Command line client for the Fugue API☆14Mar 7, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Xây dựng chương trình dự đoán ngôn ngữ lập trình của source code bất kỳ☆13Mar 31, 2021Updated 4 years ago
- An implementation of WaveNet using PyTorch & PyTorch Lightning☆13Apr 23, 2020Updated 5 years ago
- Starter application demonstrating how to connect a NestJS API to a PlanetScale MySQL database☆11Apr 12, 2023Updated 2 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- Solutions & Code Related to Blog Posts☆11Nov 6, 2024Updated last year
- torrentDownloader☆11Jul 23, 2016Updated 9 years ago
- ☆27Mar 7, 2022Updated 4 years ago
- dbt project for the domestic heating agent-based model at Centre for Net Zero.☆12Nov 15, 2022Updated 3 years ago
- simple arbitrage☆13Jul 29, 2010Updated 15 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Learning and buiding API using Fast API☆16Aug 7, 2021Updated 4 years ago
- Skooldio: Data Pipelines with Airflow☆23May 24, 2025Updated 10 months ago
- Powershell Scripts for Power BI☆13Sep 20, 2023Updated 2 years ago
- EOSIO-Taurus - The Most Powerful Infrastructure for Decentralized Applications☆13Mar 29, 2024Updated 2 years ago
- A custom AWS credential provider that allows your Hadoop or Spark application access S3 file system by assuming a role☆10Jan 9, 2026Updated 2 months ago
- ☆10Apr 15, 2023Updated 2 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago