aws-samples/aws-concurrent-data-orchestration-pipeline-emr-livy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aws-samples/aws-concurrent-data-orchestration-pipeline-emr-livy)

aws-samples / aws-concurrent-data-orchestration-pipeline-emr-livy

This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.

☆76

Alternatives and similar repositories for aws-concurrent-data-orchestration-pipeline-emr-livy

Users that are interested in aws-concurrent-data-orchestration-pipeline-emr-livy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Marcus-L / serverless-mailgun-slack
View on GitHub
A Serverless function for posting to a Slack Webhook in response to a Mailgun route
☆11Oct 12, 2016Updated 9 years ago
aws-samples / cloud-operations-best-practices
View on GitHub
☆18Updated this week
villasv / aws-airflow-stack
View on GitHub
Turbine: the bare metals that gets you Airflow
☆379Oct 10, 2021Updated 4 years ago
jupyterlab / benchmarks
View on GitHub
Benchmarking tools for JupyterLab
☆12Jun 11, 2023Updated 3 years ago
dzimine / slack-signup-serverless
View on GitHub
Serverless sign-up to Slack (and other services) with Serverless.com
☆30Nov 2, 2018Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
awslabs / emr-dynamodb-connector
View on GitHub
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
☆228Apr 8, 2026Updated 3 months ago
adaltas / spark-streaming-pyspark
View on GitHub
Build and run Spark Structured Streaming pipelines in Hadoop - project using PySpark.
☆13Jun 6, 2019Updated 7 years ago
aws-samples / aws-etl-orchestrator
View on GitHub
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
☆345Mar 29, 2024Updated 2 years ago
KenDooley / Tast.io
View on GitHub
☆11Mar 24, 2015Updated 11 years ago
infrablocks / terraform-aws-ecs-load-balancer
View on GitHub
Terraform module for deploying a load balancer to be used by a service in an existing ECS cluster in AWS
☆12Jul 10, 2026Updated last week
larsrinn / papermill-lambda
View on GitHub
☆12Oct 12, 2018Updated 7 years ago
awslabs / amazon-emr-on-eks-custom-image-cli
View on GitHub
Amazon EMR on EKS Custom Image CLI
☆32Sep 26, 2024Updated last year
nicor88 / aws-ecs-airflow
View on GitHub
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
☆160Oct 24, 2024Updated last year
aws-samples / redshift-immersionday-labs
View on GitHub
This GitHub project provides a series of lab exercises which help users get started using the Redshift platform.
☆53Mar 31, 2021Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
msfidelis / serverless-pipeline
View on GitHub
Pipeline to build, test and deploy Serverless Framework Projects with CodeBuild and CodePipeline on AWS using Terraform.
☆42Mar 12, 2019Updated 7 years ago
datamindedbe / lighthouse
View on GitHub
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆64Sep 6, 2024Updated last year
dsaidgovsg / airflow-pipeline
View on GitHub
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
☆176Apr 13, 2026Updated 3 months ago
traviscrawford / spark-dynamodb
View on GitHub
DynamoDB data source for Apache Spark
☆95Sep 2, 2021Updated 4 years ago
rssanders3 / airflow-spark-operator-plugin
View on GitHub
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
☆73Sep 20, 2019Updated 6 years ago
yodasco / pyspark-emr
View on GitHub
A toolset to streamline running spark python on EMR
☆20Nov 16, 2016Updated 9 years ago
nathanpeck / greeter-cdk
View on GitHub
Example AWS Cloud Development Kit app that deploys the greeter microservice stack
☆19Mar 27, 2026Updated 3 months ago
aws-samples / aws-building-data-lake-reinvent-session-stg206
View on GitHub
Collection of Cloud Formation Templates, Lambda Scripts and sample code required to provision an AWS Data Lake for a ReInvent Lab Exercis…
☆26Apr 9, 2019Updated 7 years ago
ComcastSamples / KinesisShardCalculator
View on GitHub
Compute the optimal number of shards for your Kinesis stream
☆18Jan 10, 2019Updated 7 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
evanmiller29 / aws_model_blog_post
View on GitHub
Repository for hosting models on AWS blog post
☆14May 6, 2019Updated 7 years ago
andresionek91 / data-scientist-value
View on GitHub
Flask app to calculate compensation of a data scientist
☆12Dec 27, 2022Updated 3 years ago
richardanaya / spark_delta_lake
View on GitHub
☆16Jun 27, 2020Updated 6 years ago
ekampf / PySpark-Boilerplate
View on GitHub
A boilerplate for writing PySpark Jobs
☆393Jan 21, 2024Updated 2 years ago
didil / serverless-testing-examples
View on GitHub
Serverless Testing Examples
☆23Jul 16, 2018Updated 8 years ago
awslabs / aws-glue-data-catalog-client-for-apache-hive-metastore
View on GitHub
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…
☆230May 18, 2026Updated 2 months ago
aws-samples / aws-dbs-refarch-datalake
View on GitHub
Reference Architectures for Datalakes on AWS
☆78May 13, 2020Updated 6 years ago
aws-samples / aws-glue-samples
View on GitHub
AWS Glue code samples
☆1,539Jun 8, 2026Updated last month
aws-samples / streaming-analytics-workshop
View on GitHub
Learn how to build an end-to-end streaming architecture to ingest, analyze, and visualize streaming data in near real-time
☆34Jul 12, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sheepkiller / presto-marathon-docker
View on GitHub
On demand presto cluster with mesos, marathon and docker.
☆29Mar 7, 2018Updated 8 years ago
awslabs / amazon-s3-tagging-spark-util
View on GitHub
☆12Oct 16, 2023Updated 2 years ago
andresionek91 / Job-Listing-Scraper
View on GitHub
Scraps jobs listings from Glassdoor
☆33Nov 21, 2019Updated 6 years ago
awslabs / amazon-s3-step-functions-ingestion-orchestration
View on GitHub
Design pattern for orchestrating an incremental data ingestion pipeline using AWS Step Functions from an on premise location into an Amaz…
☆29Jul 24, 2019Updated 6 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
aws-samples / aws-ml-data-lake-workshop
View on GitHub
As customers move from building data lakes and analytics on AWS to building machine learning solutions, one of their biggest challenges i…
☆63Nov 28, 2018Updated 7 years ago
toshi-k / kaggle-bosch-production-line-performance
View on GitHub
57th place solution in "Bosch Production Line Performance"
☆19May 19, 2017Updated 9 years ago