This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.
☆46Sep 26, 2024Updated last year
Alternatives and similar repositories for Real-Time-PySpark
Users that are interested in Real-Time-PySpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Dec 28, 2020Updated 5 years ago
- Demonstrating the efficiency of pmdarima’s auto_arima() function compared to implementing a traditional ARIMA model.☆12Feb 16, 2021Updated 5 years ago
- ☆13Nov 4, 2020Updated 5 years ago
- capstone project for Dataengineer.io bootcamp Public Repo☆12Feb 20, 2024Updated 2 years ago
- ☆13Apr 14, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implementing best practices for PySpark ETL jobs and applications.☆2,101Jan 1, 2023Updated 3 years ago
- Building an PD, LGD and EAD Model for Financial Modeling.☆15Dec 19, 2023Updated 2 years ago
- Codes related to data wrangling☆12Apr 12, 2020Updated 6 years ago
- An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)☆16Sep 20, 2023Updated 2 years ago
- This project demonstrates how to build and automate an ETL pipeline written in Python and schedule it using open source Apache Airflow or…☆23Aug 21, 2025Updated 9 months ago
- Using the 1998 DARPA Intrusion Detection Evaluation dataset I configured a Random Forest model for anomaly detection☆14Feb 15, 2019Updated 7 years ago
- ☆44May 4, 2025Updated last year
- ☆17May 26, 2023Updated 2 years ago
- The Ultimate Guide to Snowpark, published by Packt☆16Jun 8, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- My solutions for the Udacity Data Engineering Nanodegree☆34Oct 14, 2019Updated 6 years ago
- ☆21Oct 21, 2024Updated last year
- This is the first project where we worked on apache spark, In this project what we have done is that we downloaded the datasets from KAGG…☆23Oct 14, 2021Updated 4 years ago
- A repository for Analysis of Toronto Neighbourhoods (An IBM Data Science Capstone Project)☆10Jan 15, 2021Updated 5 years ago
- This repo contains all the code used in the Python for Data Engineering Course☆363Apr 24, 2024Updated 2 years ago
- Starter application demonstrating how to connect a NestJS API to a PlanetScale MySQL database☆11May 6, 2026Updated 2 weeks ago
- ☆69Updated this week
- An example integration between Flask and the Preact front end library.☆13Jun 20, 2022Updated 3 years ago
- A practical guide to dimensional data modeling with clear examples, SQL scripts, and explanations of dimension types, fact tables, and st…☆26Mar 27, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆10Feb 12, 2026Updated 3 months ago
- Spark Notebook docker image☆10Dec 29, 2017Updated 8 years ago
- ☆30Mar 19, 2024Updated 2 years ago
- (Python, PySpark)☆11Nov 15, 2020Updated 5 years ago
- ☆11Sep 6, 2019Updated 6 years ago
- Set up an async pipeline in python using Celery, RabbitMQ and MongoDB. This repo covers the end to end deployment of an async pipeline fo…☆13Sep 23, 2022Updated 3 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- End-to-end data engineering pipeline with various technologies to ingest real time data.☆27Nov 3, 2023Updated 2 years ago
- StockStream is a web application developed using Streamlit, designed to provide users with real-time stock price data, stock price predic…☆21Oct 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Student projects in Big Data field.☆19May 8, 2026Updated last week
- Nyc_Taxi_Data_Pipeline - DE Project☆141Oct 21, 2024Updated last year
- Sample RAG pattern using Azure SQL DB, Langchain and Chainlit☆34Dec 3, 2024Updated last year
- Text Analysis: Implementation of ULMFiT by Howard & Ruder on Twitter dataset☆10Feb 7, 2019Updated 7 years ago
- Feature demos, integration guides & hands-on labs/projects using Kpow, Flex, Kafka, Flink, Iceberg & more☆52May 11, 2026Updated last week
- ☆14Mar 16, 2026Updated 2 months ago
- Examples of Using DBTunnel☆11Apr 24, 2024Updated 2 years ago