Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
☆47Jul 13, 2022Updated 3 years ago
Alternatives and similar repositories for modern-data-lake-storage-layers
Users that are interested in modern-data-lake-storage-layers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides y…☆16Nov 22, 2022Updated 3 years ago
- Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR☆17Apr 27, 2025Updated last year
- ☆20Jan 19, 2024Updated 2 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆53Oct 31, 2023Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆67Sep 23, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Proof of concept of a big data cluster using open source tools☆11Apr 10, 2024Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆49Oct 9, 2022Updated 3 years ago
- Auto-fixing error due to version upgrade, good practice etc.☆11Sep 5, 2020Updated 5 years ago
- ☆18Jun 16, 2024Updated last year
- Examples and Quick Starts for Snowflake☆11Apr 4, 2026Updated last month
- 🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena☆30Jul 25, 2022Updated 3 years ago
- ☆32Jan 30, 2026Updated 4 months ago
- Hybrid Search (BM25 & Vector) with SQLite☆33Aug 13, 2024Updated last year
- Bits of code I use during live demos☆30Dec 19, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆39Jun 1, 2022Updated 3 years ago
- EMR Hudi Workshop content☆12Dec 10, 2021Updated 4 years ago
- Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post☆11Nov 8, 2024Updated last year
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certific…☆10Sep 25, 2019Updated 6 years ago
- Repository for the paper "Discovering and Categorising Language Biases in Reddit" accepted at the International Conference on Web and Soc…☆12Aug 20, 2024Updated last year
- Set of Terraform scripts to spin up virtual lab infra for Cisco Cloud onRamp (CoR) for Multicloud☆15Oct 25, 2023Updated 2 years ago
- Code to munge data between Kaggle .tsv Rotten Tomatoes Sentiment Analysis data set and Vowpal Wabbit☆24Jun 22, 2014Updated 11 years ago
- this repogitory describe how to use avro-tools☆12Feb 21, 2018Updated 8 years ago
- Serverless costs calculator for AWS Lambda☆12Oct 21, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Example code for running Spark and Hive jobs on EMR Serverless.☆170May 14, 2026Updated 2 weeks ago
- ☆15Apr 4, 2021Updated 5 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- Pair Trading Analysis & Exercises Toolkit [Jupyter Notebook]☆12Nov 3, 2023Updated 2 years ago
- dbt / Amazon Redshift Demonstration Project☆34Jan 3, 2023Updated 3 years ago
- Making the transition from Scratch to Python☆11Apr 11, 2017Updated 9 years ago
- Unity Catalog Explorer is a TypeScript + Next.js based Web UI for the Unity Catalog OSS.☆13Jun 29, 2024Updated last year
- SensitiveBye是一款专注于解决数据脱敏的Java和SpringBoot工具包, 能帮助您快速解决项目中的脱敏需求,支持对象字段,接口字段,数据库字段脱敏,json序列化脱敏,日志打印脱敏、敏感词条脱敏、Spring配置文件脱敏等功能☆12Jun 5, 2025Updated 11 months ago
- The source code for the book Modern Data Engineering with Apache Spark☆40Jul 26, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆12Aug 17, 2023Updated 2 years ago
- A dotnet standard wrapper for the Uniswap V2 Subgraph on The Graph GraphQL API.☆12Dec 17, 2020Updated 5 years ago
- This repo demonstrates how to use AWS application auto-scaling to implement custom-scaling in your Kinesis Data Analytics for Apache Flin…☆19Feb 21, 2025Updated last year
- These scripts clean the unused EBS volumes, AMIs and snapshots on Amazon Web Services.☆11Jul 24, 2015Updated 10 years ago
- Build Multi-Account and Multi-VPC AWS network infrastructure with Network Shared Services (NSS)☆11Apr 28, 2025Updated last year
- ☆15Dec 10, 2025Updated 5 months ago
- ☆74Jun 26, 2024Updated last year