wirelessr / flink-iceberg-playground
minio as local storage and DynamoDB as catalog
☆13Updated 9 months ago
Alternatives and similar repositories for flink-iceberg-playground:
Users that are interested in flink-iceberg-playground are comparing it to the libraries listed below
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark☆13Updated last year
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Updated this week
- Using the Parquet file format (with Avro) to process data with Apache Flink☆14Updated 9 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- Automatically loads new partitions in AWS Athena☆18Updated 4 years ago
- GetInData Helm Charts repository☆12Updated 2 years ago
- A testing framework for Trino☆26Updated 3 months ago
- ☆46Updated last month
- ☆22Updated 5 years ago
- Data Catalog for Databases and Data Warehouses☆32Updated last year
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Unity Catalog UI☆39Updated 5 months ago
- ☆13Updated last week
- Apache iceberg Spark s3 examples☆19Updated 11 months ago
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆10Updated 2 years ago
- ☆47Updated 6 months ago
- This is a basic Apache Pinot example for ingesting real-time MySQL change logs using Debezium☆27Updated 4 years ago
- Analysis of the CI workflows of the trinodb/trino project☆14Updated this week
- Data Sketches for Apache Spark☆22Updated 2 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆50Updated last year
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆9Updated last year
- Demos using Conduktor Gateway☆16Updated 10 months ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated last year
- Presto Trino with Apache Hive Postgres metastore☆39Updated 5 months ago
- Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API☆14Updated 4 years ago
- KSQL Syntax Highlighting for VSCode☆17Updated 2 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated this week