aws-samples / iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
☆23Updated 5 months ago
Alternatives and similar repositories for iceberg-streaming-examples:
Users that are interested in iceberg-streaming-examples are comparing it to the libraries listed below
- Sample code to collect Apache Iceberg metrics for table monitoring☆26Updated 8 months ago
- ☆11Updated 4 months ago
- Unity Catalog UI☆40Updated 7 months ago
- Utility functions for dbt projects running on Spark☆32Updated 2 months ago
- dbt / Amazon Redshift Demonstration Project☆34Updated 2 years ago
- ☆16Updated 2 years ago
- Demonstrating the capabilities of DuckDB as a transformation engine for data lakes☆23Updated 6 months ago
- Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3☆23Updated 7 months ago
- duckdb-etl-framework☆10Updated 3 months ago
- Demo for GitHub Universe 2022☆12Updated 2 years ago
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated 2 years ago
- Delta Lake Documentation☆49Updated 9 months ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆27Updated last year
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆25Updated 11 months ago
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.☆52Updated last month
- This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS …☆19Updated 3 years ago
- A DataOps framework for building a lakehouse.☆50Updated this week
- ☆19Updated 2 months ago
- Python implementation of Age-Partitioned Bloom Filter with S3 periodic backup support.☆11Updated 2 months ago
- In this repository, we show how to get started with data lineage on AWS using OpenLineage. This is an AWS Cloud Development Kit project (…☆12Updated 8 months ago
- Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR☆18Updated 8 months ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆20Updated last year
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated last year
- aws-solutions-library-samples / guidance-for-preparing-and-validating-records-for-entity-resolution-on-awsThis Guidance demonstrates how to prepare and validate Personally Identifiable Information (PII) data, including physical address, phone,…☆9Updated 5 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆50Updated last year
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- 📆 Run, schedule, and manage your dbt jobs using Kubernetes.☆24Updated 6 years ago
- ☆27Updated 3 weeks ago