wirelessr / flink-iceberg-playground
minio as local storage and DynamoDB as catalog
☆11Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for flink-iceberg-playground
- Using the Parquet file format (with Avro) to process data with Apache Flink☆14Updated 9 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 7 years ago
- Dashboard for operating Flink jobs and deployments.☆25Updated 3 weeks ago
- Demos using Conduktor Gateway☆16Updated 6 months ago
- AWS Quick Start Team☆14Updated last month
- ☆22Updated 5 years ago
- Automatically loads new partitions in AWS Athena☆18Updated 4 years ago
- Cloud Storage Connector integrates Apache Pulsar with cloud storage.☆28Updated this week
- This is a basic Apache Pinot example for ingesting real-time MySQL change logs using Debezium☆27Updated 3 years ago
- Ingest JSON records from Kafka to multiple tables in the database using the DataStax Apache Kafka Connector☆13Updated 2 years ago
- This repository contains recipes for Apache Pinot.☆24Updated last month
- Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters☆18Updated last week
- Connect DBVisualizer to Hortonwork HiveServer2☆9Updated 9 years ago
- Collection of code examples for Amazon Managed Service for Apache Flink☆39Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆10Updated last year
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 3 years ago
- Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark☆13Updated last year
- Amazon EMR on EKS Custom Image CLI☆25Updated last month
- A curated list of Apache Pulsar resources☆13Updated 6 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 7 months ago
- ☆24Updated 2 months ago
- GetInData Helm Charts repository☆12Updated 2 years ago
- ☆15Updated 4 months ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- An application that records stats about consumer group offset commits and reports them as prometheus metrics☆14Updated 5 years ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated last year
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A testing framework for Trino☆25Updated 3 months ago
- ARCHIVED: Run Debezium/KafkaConnect CDC components in Kubernetes☆24Updated 5 years ago