dogukannulu/glue_etl_job_data_catalog_s3

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dogukannulu/glue_etl_job_data_catalog_s3)

dogukannulu / glue_etl_job_data_catalog_s3

Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog

☆13

Alternatives and similar repositories for glue_etl_job_data_catalog_s3

Users that are interested in glue_etl_job_data_catalog_s3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dogukannulu / crypto_api_kafka_airflow_streaming
View on GitHub
Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase
☆17Oct 2, 2023Updated 2 years ago
dogukannulu / aws_end_to_end_streaming_pipeline
View on GitHub
An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)
☆17Sep 20, 2023Updated 2 years ago
dogukannulu / airflow_kafka_cassandra_mongodb
View on GitHub
Produce Kafka messages, consume them and upload into Cassandra, MongoDB.
☆43Sep 26, 2023Updated 2 years ago
JChz6 / DE-ZCamp-Project
View on GitHub
☆15Mar 29, 2024Updated 2 years ago
dogukannulu / streaming_data_processing
View on GitHub
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆65Jul 21, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
EdidiongEsu / capital_bikeshare
View on GitHub
☆20Apr 3, 2024Updated 2 years ago
judeleonard / Prescriber-ETL-data-pipeline
View on GitHub
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…
☆26Dec 7, 2022Updated 3 years ago
yunusgrgz1 / flight-map-kafka-spark-mongo-postgresql
View on GitHub
☆16Oct 8, 2025Updated 9 months ago
mvahit / dsmlbc5
View on GitHub
☆11Nov 23, 2021Updated 4 years ago
vordimous / gohlay
View on GitHub
The Kafka message scheduling tool.
☆19Jan 20, 2025Updated last year
SweetAdjPotato / machine-learning-algorithms-with-without-libraries
View on GitHub
☆11Mar 7, 2021Updated 5 years ago
PDahlen / InvestmentBanker
View on GitHub
☆11Aug 7, 2023Updated 2 years ago
shivanshkaushikk / mistral-image-captioning-agent
View on GitHub
Image Captioning Agent using Mistral 7B
☆11Dec 1, 2023Updated 2 years ago
mehroosali / databricks-F1-Project
View on GitHub
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.
☆18Jan 12, 2022Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
tatwan / airflow-spark-aws-emr
View on GitHub
☆12Mar 6, 2021Updated 5 years ago
emmaliaocode / vagrant-vmware-arm
View on GitHub
Provision Ubuntu VMs with Vagrant and VMware on macOS ARM64
☆11Oct 26, 2023Updated 2 years ago
dogukannulu / kafka_spark_structured_streaming
View on GitHub
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
☆146Jul 27, 2023Updated 2 years ago
kamilmuratyilmaz / earthquake-location-from-image
View on GitHub
☆12Feb 6, 2023Updated 3 years ago
salmah52 / youtubeetl
View on GitHub
☆15Oct 19, 2023Updated 2 years ago
sheridan-python / tutorial-vending-machine-solution
View on GitHub
tutorial-vending-machine-marcgibbons created by GitHub Classroom
☆10May 22, 2019Updated 7 years ago
aws-samples / amazon-sagemaker-genai-content-moderation
View on GitHub
☆15Apr 10, 2024Updated 2 years ago
ipeluffo / faust-hashtags-counter
View on GitHub
Sample Faust project to process tweets in real-time
☆13Mar 29, 2021Updated 5 years ago
YFChiu / Resources--Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0
View on GitHub
(Python, PySpark)
☆11Nov 15, 2020Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
ryanbrownnetworking777 / dataengineerio-capstone-ryanbrown
View on GitHub
capstone project for Dataengineer.io bootcamp Public Repo
☆12Feb 20, 2024Updated 2 years ago
aws-samples / sagemaker-generative-ai-for-product-placement-using-images
View on GitHub
☆12Apr 13, 2026Updated 2 months ago
airscholar / YoutubeAnalytics
View on GitHub
An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlD…
☆16Sep 19, 2023Updated 2 years ago
Krishnamohan-Yerrabilli / Deployment-on-K8s-cluster-using-jenkins-CI-CD
View on GitHub
In this project, we will be deploying a Kubernetes cluster using a Jenkins CI/CD pipeline. We will be utilizing various DevOps tools such…
☆13Jun 6, 2023Updated 3 years ago
veribilimiokulu / blog-erkan
View on GitHub
☆13Apr 24, 2026Updated 2 months ago
mInzamamMalik / Ai-Chatbot-Online
View on GitHub
class code of Ai chatbot and voice app online course
☆11Jul 22, 2025Updated 11 months ago
amanverasia / udemy-bot-free-courses
View on GitHub
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary …
☆19Feb 29, 2024Updated 2 years ago
AhmetFurkanDEMIR / dataengineering-youtube-project
View on GitHub
Data Engineering Youtube Project
☆12Jun 29, 2023Updated 3 years ago
Bahmni / openerp-modules
View on GitHub
Custom OpenERP modules (extensions) for Bahmni
☆21Apr 13, 2021Updated 5 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
enessoztrk / WhatsApp_Chat_Analysis_Heroku_Deployment
View on GitHub
☆15Nov 10, 2022Updated 3 years ago
skth5199 / graph-based-fraud-detection
View on GitHub
Fraud detection using Graph Convolutional Networks
☆12May 9, 2022Updated 4 years ago
akarce / e2e-structured-streaming
View on GitHub
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API…
☆21Jul 26, 2024Updated last year
OpenShamela / shamela_crawler
View on GitHub
A Python 3 Scrapy web crawler to download data from Shamela Library https://shamela.ws.
☆20Dec 29, 2025Updated 6 months ago
mengyjia / Marketing-Analytics
View on GitHub
Complete machine learning analysis to solve marketing problems.
☆19Apr 6, 2018Updated 8 years ago
lucasnscr / Resilience-Patterns
View on GitHub
Explaining and Implementing Resilience patterns in Microservice Architecture
☆14Jan 12, 2023Updated 3 years ago
khuyentran1401 / detect-data-drift-pipeline
View on GitHub
A pipeline to detect data drift and retrain the model when there is drift
☆25Aug 3, 2023Updated 2 years ago