Various data stream/batch process demo with Apache Scala Spark π
β12Feb 28, 2020Updated 6 years ago
Alternatives and similar repositories for spark-etl-pipeline
Users that are interested in spark-etl-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multi-stage, config driven, SQL based ETL framework using PySparkβ26Sep 16, 2019Updated 6 years ago
- Our style guide for writing readable and maintainable PySpark code.β17Dec 21, 2021Updated 4 years ago
- Learning PySpark video seriesβ11Mar 5, 2018Updated 8 years ago
- β10Jul 31, 2019Updated 6 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.β32Aug 14, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Listing my favorite research papers π from different fields as I read them.β10Oct 17, 2019Updated 6 years ago
- Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pβ¦β26Jun 4, 2019Updated 6 years ago
- Indie game project driven by a few enthusiasts.β15Aug 25, 2017Updated 8 years ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computinβ¦β25Aug 11, 2023Updated 2 years ago
- β10Feb 12, 2021Updated 5 years ago
- Maven plugin for Scalastyleβ23Apr 11, 2023Updated 2 years ago
- Code for training on Imagenet to SOTA results using PyTorchβ13Aug 14, 2023Updated 2 years ago
- Code for the blog postβ12Jan 15, 2021Updated 5 years ago
- Botoflow is an asynchronous framework for Amazon SWF that helps you build SWF applications using Pythonβ13Dec 26, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A Python script to swoop and decrypt passwords from Chrome's local storage.β11Dec 10, 2018Updated 7 years ago
- curated list of awesome open source repositories for data pipelining and machine learning in production.β17Dec 1, 2019Updated 6 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract dβ¦β24Nov 22, 2021Updated 4 years ago
- Spark data pipeline that processes movie ratings data.β31Mar 1, 2026Updated 3 weeks ago
- This project is to integration HP ALM and other test automation frameworks.β10May 25, 2020Updated 5 years ago
- Python scripts for Agisoft Photoscanβ12Jun 18, 2015Updated 10 years ago
- Dump the saved wifi passwords for windows using regular expressions and python 3β17Dec 22, 2016Updated 9 years ago
- Publicly shareable content about the University of Texas MSAIO programβ20Jul 2, 2025Updated 8 months ago
- Minimalist implementation of VQ-VAE in Pytorchβ10Sep 9, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Solving Captchas using Deep Learningβ13Apr 17, 2023Updated 2 years ago
- Daily-updated reading list for designing High Scalability , High Availability , High Stability back-end systems - Pull requests are greβ¦β15Jul 14, 2022Updated 3 years ago
- How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.β14Jun 8, 2020Updated 5 years ago
- β13Jun 7, 2018Updated 7 years ago
- A Rust based data/CSV/Parquet file generatorβ65Mar 3, 2025Updated last year
- β10Dec 16, 2022Updated 3 years ago
- We store attacks and exploits that we've found useful in our researchβ13Jun 4, 2015Updated 10 years ago
- Scala library for alpaca.marketsβ12Aug 5, 2019Updated 6 years ago
- Repository for Google Cloud Run Deep Diveβ11Jul 8, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Movie recommender system with Collaborative Filtering using PySparkβ28Apr 17, 2017Updated 8 years ago
- β13Feb 27, 2018Updated 8 years ago
- Template for those following Creative Scalaβ17Feb 6, 2025Updated last year
- Examples for ETL Integrations with Adobe Experience Platformβ14Aug 16, 2024Updated last year
- List of awesome university courses for learning Computer Science!β14Oct 1, 2018Updated 7 years ago
- β12Aug 6, 2020Updated 5 years ago
- NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBaseβ16Oct 31, 2014Updated 11 years ago