yennanliu/spark-etl-pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yennanliu/spark-etl-pipeline)

yennanliu / spark-etl-pipeline

Various data stream/batch process demo with Apache Scala Spark 🚀

☆12

Alternatives and similar repositories for spark-etl-pipeline

Users that are interested in spark-etl-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

avensolutions / spark-sql-etl-framework
View on GitHub
Multi-stage, config driven, SQL based ETL framework using PySpark
☆26Sep 16, 2019Updated 6 years ago
vectra-ai-research / pyspark-style-guide
View on GitHub
Our style guide for writing readable and maintainable PySpark code.
☆17Dec 21, 2021Updated 4 years ago
drabastomek / learningPySpark_video
View on GitHub
Learning PySpark video series
☆11Mar 5, 2018Updated 8 years ago
san089 / Optimizing-Public-Transportation
View on GitHub
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
☆33Aug 14, 2023Updated 2 years ago
d-e-n-t-y / pg_fdw_mv_rewrite
View on GitHub
☆10Jul 31, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RahulBhalley / favorite-research-papers
View on GitHub
Listing my favorite research papers 📝 from different fields as I read them.
☆10Oct 17, 2019Updated 6 years ago
sungchun12 / serverless-data-pipeline-gcp
View on GitHub
Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and p…
☆25Jun 4, 2019Updated 7 years ago
jamesbyars / apache-spark-etl-pipeline-example
View on GitHub
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…
☆24Aug 11, 2023Updated 2 years ago
spbsmile / Caveman
View on GitHub
Indie game project driven by a few enthusiasts.
☆15Aug 25, 2017Updated 8 years ago
postgrespro / plantuner
View on GitHub
☆10Feb 12, 2021Updated 5 years ago
bonlime / sota_imagenet
View on GitHub
Code for training on Imagenet to SOTA results using PyTorch
☆13Aug 14, 2023Updated 2 years ago
boto / botoflow
View on GitHub
Botoflow is an asynchronous framework for Amazon SWF that helps you build SWF applications using Python
☆13Dec 26, 2022Updated 3 years ago
Rachnog / From-Physics-To-GANs
View on GitHub
Code for the blog post
☆12Jan 15, 2021Updated 5 years ago
guidok91 / spark-movies-etl
View on GitHub
Spark data pipeline that processes movie ratings data.
☆31Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
karthikeyankc / Swoopy
View on GitHub
A Python script to swoop and decrypt passwords from Chrome's local storage.
☆11Dec 10, 2018Updated 7 years ago
asampat3090 / production-level-machine-learning
View on GitHub
curated list of awesome open source repositories for data pipelining and machine learning in production.
☆17Dec 1, 2019Updated 6 years ago
shravan-kuchkula / udacity-data-eng-proj2
View on GitHub
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…
☆24Nov 22, 2021Updated 4 years ago
geojames / photoscan
View on GitHub
Python scripts for Agisoft Photoscan
☆12Jun 18, 2015Updated 11 years ago
D4Vinci / WifiPass
View on GitHub
Dump the saved wifi passwords for windows using regular expressions and python 3
☆17Dec 22, 2016Updated 9 years ago
twosigma / postgresql-contrib
View on GitHub
☆13Jun 7, 2018Updated 8 years ago
sarwarisak / captcha
View on GitHub
Solving Captchas using Deep Learning
☆13Apr 17, 2023Updated 3 years ago
mitmedialab / 3D-VAE
View on GitHub
Minimalist implementation of VQ-VAE in Pytorch
☆10Sep 9, 2018Updated 7 years ago
geordielad / tableau-athena-credential-provider-examples
View on GitHub
How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.
☆14Jun 8, 2020Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
CSLDepend / exploits
View on GitHub
We store attacks and exploits that we've found useful in our research
☆13Jun 4, 2015Updated 11 years ago
mahapatra09 / aflux
View on GitHub
☆10Dec 16, 2022Updated 3 years ago
linuxacademy / content-google-cloud-run-deep-dive
View on GitHub
Repository for Google Cloud Run Deep Dive
☆11Jul 8, 2020Updated 6 years ago
narenmanoharan / Movie-Recommender-System
View on GitHub
Movie recommender system with Collaborative Filtering using PySpark
☆28Apr 17, 2017Updated 9 years ago
danielbeach / datahobbit
View on GitHub
A Rust based data/CSV/Parquet file generator
☆66Mar 3, 2025Updated last year
zeroc0d3lab / awesome-scalability
View on GitHub
Daily-updated reading list for designing High Scalability , High Availability , High Stability back-end systems - Pull requests are gre…
☆15Jul 14, 2022Updated 4 years ago
hurtn / databricks
View on GitHub
☆12Aug 6, 2020Updated 5 years ago
ispoljari / run-tracker-app
View on GitHub
The purpose of this app is to enable users to log their runs and share their activity with other users on the platform
☆18Feb 11, 2019Updated 7 years ago
aau-daisy / solvedb
View on GitHub
SolveDB: A PostgreSQL-based DBMS for optimization applications
☆18Mar 1, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Intuz-production / Firebase-Phone-Verification-Android
View on GitHub
Android Custom Firebase Phone Verification
☆17Mar 11, 2026Updated 4 months ago
waynegraham / photoscan_scripts
View on GitHub
Scripts for the Python API in PhotoScan
☆17Sep 1, 2015Updated 10 years ago
RangeNetworks / subscriberRegistry
View on GitHub
Subscriber Registry API and SIP Authentication Server
☆19Jul 13, 2016Updated 10 years ago
cynance / alpaca-scala
View on GitHub
Scala library for alpaca.markets
☆12Aug 5, 2019Updated 6 years ago
tcharding / self_learning
View on GitHub
Text books and programming problem websites
☆12Apr 22, 2026Updated 3 months ago
SamWSoftware / goldenshoes
View on GitHub
☆13Feb 27, 2018Updated 8 years ago
lambdaofgod / api-examples
View on GitHub
Examples of using various APIs from (mostly) Python
☆16Jul 24, 2022Updated 4 years ago