awslabs/aws-glue-apache-hudi-operational-data-processing-framework

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/awslabs/aws-glue-apache-hudi-operational-data-processing-framework)

awslabs / aws-glue-apache-hudi-operational-data-processing-framework

Operational Data Processing Framework developed using AWS Glue and Apache Hudi. This framework is suitable for Data Lake and Modern Data Platform implementations on the AWS Cloud.

☆24

Alternatives and similar repositories for aws-glue-apache-hudi-operational-data-processing-framework

Users that are interested in aws-glue-apache-hudi-operational-data-processing-framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aws-samples / aws-glue-data-catalog-replication-utility
View on GitHub
Replication utility for AWS Glue Data Catalog
☆80Aug 8, 2024Updated last year
aws-samples / amazon-ecs-and-aws-step-functions-design-patterns-starter-kit
View on GitHub
☆16Jun 14, 2023Updated 3 years ago
awslabs / aws-glue-streaming-libs
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
aws-samples / aws-cdk-pipelines-datalake-etl
View on GitHub
This solution helps you deploy ETL jobs on data lake using CDK Pipelines.
☆69Aug 9, 2022Updated 3 years ago
aws-samples / aws-s3-copy-sync-using-batch
View on GitHub
☆20Sep 13, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bys-control / docker-prometheus-monitoring
View on GitHub
Dockerized Prometheus + Grafana Monitoring stack
☆11Nov 1, 2024Updated last year
aws-samples / amazon-bedrock-synthetic-manufacturing-data-generator
View on GitHub
☆14May 2, 2024Updated 2 years ago
aws-samples / aws-cdk-pipelines-datalake-infrastructure
View on GitHub
This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.
☆101Aug 12, 2022Updated 3 years ago
awslabs / data-compare-tool
View on GitHub
☆15Updated this week
aws-samples / aws-lakeformation-access-controls-automation
View on GitHub
☆20Aug 10, 2021Updated 4 years ago
aws-samples / amazon-kinesis-data-analytics-snapshot-manager-for-flink
View on GitHub
Snapshot manager for Amazon Kinesis Data Analytics for Apache Flink helps the users to generate a snapshot on a periodic basis.
☆19Aug 30, 2023Updated 2 years ago
vasveena / Hudi_Demo_Notebook
View on GitHub
Hudi Demo Notebook
☆11Mar 5, 2024Updated 2 years ago
awslabs / spark-sql-kinesis-connector
View on GitHub
Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)
☆41Updated this week
awslabs / aws-glue-blueprint-libs
View on GitHub
☆71May 8, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ran-isenberg / appsync-events-client
View on GitHub
AppSync Events frontend sample implementation
☆12Nov 16, 2024Updated last year
code4mk / aws-lambda-serverless-fastapi
View on GitHub
aws lamda fastapi with serverless
☆17Dec 1, 2024Updated last year
aws-samples / iceberg-streaming-examples
View on GitHub
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…
☆29Jul 16, 2026Updated last week
aws-samples / emr-studio-notebook-examples
View on GitHub
This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.
☆53Oct 31, 2023Updated 2 years ago
aws-samples / amazon-deequ-glue
View on GitHub
Automated data quality suggestions and analysis with Deequ on AWS Glue
☆93Dec 29, 2022Updated 3 years ago
gtalarico / interactive-elastic-analyzer
View on GitHub
Interactive Elasticsearch Analyzer
☆13Dec 8, 2022Updated 3 years ago
aws-solutions-library-samples / data-lakes-on-aws
View on GitHub
Enterprise-grade, production-hardened, serverless data lake on AWS
☆482Oct 1, 2025Updated 9 months ago
gocologne / meetups
View on GitHub
Meetup Organisation
☆10Oct 12, 2018Updated 7 years ago
karuppiah7890 / grpc-demo
View on GitHub
Demo to try out gRPC with NodeJS gRPC client and Golang gRPC server
☆14Sep 1, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ukgovdatascience / twitter-mq-feed
View on GitHub
A script that gets data from the Twitter real-time API, passes it to a message-queue (e.g. RabbitMQ) and stores tweets into MongoDB
☆11Apr 20, 2017Updated 9 years ago
moia-oss / bastion-host-forward
View on GitHub
CDK Construct for creating a bastion host to forward a connection to several AWS data services inside a private subnet from your local ma…
☆30Jul 15, 2026Updated last week
vincentclaes / glue-devcontainer
View on GitHub
Glue VSCode devcontainer setup
☆14Jan 31, 2023Updated 3 years ago
olliegg123 / RFID-Jukebox-GoogleCast
View on GitHub
A really poor way of creating an RFID Jukebox to cast to Google Home
☆15Dec 28, 2018Updated 7 years ago
aws-samples / aws-emr-utilities
View on GitHub
☆45Updated this week
som-shahlab / clinical_trial_patient_matching
View on GitHub
Zero-shot clinical trial matching with LLMs
☆19Mar 1, 2025Updated last year
simonireilly / sst-python-api
View on GitHub
Type safe; schema compliant API with local dev baked in ✨
☆12Jun 23, 2021Updated 5 years ago
napi-rs / tar
View on GitHub
Node.js tar binding https://docs.rs/tar/latest/tar/
☆16Jul 16, 2026Updated last week
hyesunyun / llm-meta-analysis
View on GitHub
Automating meta-analysis of clinical trials (randomized controlled trials)
☆25Sep 4, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
vnijs / quizr
View on GitHub
Create interactive quizzes using Shiny and Rmarkdown
☆16Aug 18, 2024Updated last year
primeharbor / pht-account-configurator
View on GitHub
Configure a new AWS Account with security best practices
☆21Jun 21, 2026Updated last month
awslabs / apn-competency-helper
View on GitHub
APN Designations template folder structure and presentation, including APN Competency Program and APN Service Delivery Program
☆22Feb 11, 2025Updated last year
YuehHanChen / Telco_Customer_Churn_Analysis
View on GitHub
Use Multiple Linear Regression, Python, Pandas, and Matplotlib to analyze the lifetime value and the key factors of the ‘Telco Customer C…
☆13May 6, 2020Updated 6 years ago
BlackHole1 / idea-spell-check
View on GitHub
CSpell Check For IDEA
☆11Updated this week
astronomer / cosmos-demo
View on GitHub
Demo DAGs that show how to run dbt Core in Airflow using Cosmos
☆68May 12, 2026Updated 2 months ago
mattkubej / koa-ts
View on GitHub
TypeScript rewrite of koa
☆12Mar 4, 2023Updated 3 years ago