aws-samples / iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
☆20Updated 2 months ago
Alternatives and similar repositories for iceberg-streaming-examples:
Users that are interested in iceberg-streaming-examples are comparing it to the libraries listed below
- Sample code to collect Apache Iceberg metrics for table monitoring☆23Updated 5 months ago
- ☆16Updated last year
- Operational Data Processing Framework developed using AWS Glue and Apache Hudi. This framework is suitable for Data Lake and Modern Data …☆21Updated last year
- Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3☆20Updated 4 months ago
- dbt / Amazon Redshift Demonstration Project☆33Updated 2 years ago
- In this repository, we show how to get started with data lineage on AWS using OpenLineage. This is an AWS Cloud Development Kit project (…☆12Updated 6 months ago
- ☆16Updated 9 months ago
- Demo for GitHub Universe 2022☆12Updated last year
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated last year
- ☆31Updated 11 months ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Updated this week
- AWS Glue Configurable Test Data Generator for S3 Data Lakes and DynamoDB☆16Updated last year
- Unity Catalog UI☆39Updated 4 months ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- ☆24Updated 5 months ago
- This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS …☆19Updated 3 years ago
- ☆19Updated 3 months ago
- Delta Lake Documentation☆48Updated 7 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- Example Set up For DBT Cloud using Github Integrations☆11Updated 4 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated last year
- Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR☆17Updated 5 months ago
- A guidance that provides declarative data processing capability, and workflow orchestration automation to help your business users (such …☆29Updated 9 months ago
- Spark runtime on AWS Lambda☆105Updated 4 months ago
- This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, …☆23Updated 2 months ago
- ☆11Updated this week
- ☆34Updated 2 years ago