Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
☆47Jul 13, 2022Updated 3 years ago
Alternatives and similar repositories for modern-data-lake-storage-layers
Users that are interested in modern-data-lake-storage-layers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides y…☆16Nov 22, 2022Updated 3 years ago
- This repository provides the resources required for the Amazon Redshift Streaming workshop☆13Jul 12, 2023Updated 2 years ago
- Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR☆17Apr 27, 2025Updated 11 months ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆53Oct 31, 2023Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆66Sep 23, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Delta-Lake, ETL, Spark, Airflow☆49Oct 9, 2022Updated 3 years ago
- Auto-fixing error due to version upgrade, good practice etc.☆11Sep 5, 2020Updated 5 years ago
- ☆16May 9, 2022Updated 3 years ago
- ☆18Jun 16, 2024Updated last year
- ☆11Apr 27, 2021Updated 4 years ago
- 🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena☆30Jul 25, 2022Updated 3 years ago
- AWS Lambda function - automatically PGP encrypts files added to S3 bucket☆16May 3, 2022Updated 3 years ago
- FHIR to OMOP using PySpark on AWS Glue☆14May 8, 2021Updated 4 years ago
- Bits of code I use during live demos☆30Dec 19, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Sample datasets and code for operationalizing Amazon Fraud Detector using SageMaker DataWrangler, Feature Store, and Pipelines.☆18Dec 1, 2022Updated 3 years ago
- ☆39Jun 1, 2022Updated 3 years ago
- EMR Hudi Workshop content☆12Dec 10, 2021Updated 4 years ago
- Machine learning enhancements to Spark MlLib☆20Mar 19, 2015Updated 11 years ago
- Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post☆11Nov 8, 2024Updated last year
- Repositório dedicado a Workshop de Data Lakehouse com Delta Lake☆17Dec 6, 2021Updated 4 years ago
- ☆18Apr 14, 2023Updated 2 years ago
- Template for a modular, Python-based data science project.☆41Apr 9, 2024Updated last year
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certific…☆10Sep 25, 2019Updated 6 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Set of Terraform scripts to spin up virtual lab infra for Cisco Cloud onRamp (CoR) for Multicloud☆15Oct 25, 2023Updated 2 years ago
- Code to munge data between Kaggle .tsv Rotten Tomatoes Sentiment Analysis data set and Vowpal Wabbit☆24Jun 22, 2014Updated 11 years ago
- Serverless costs calculator for AWS Lambda☆12Oct 21, 2020Updated 5 years ago
- ☆21Dec 3, 2025Updated 3 months ago
- Example code for running Spark and Hive jobs on EMR Serverless.☆169Mar 11, 2026Updated 2 weeks ago
- A FHIR implementation guide that supports conversion of data from FHIR to OMOP and OMOP to FHIR☆15Mar 20, 2026Updated last week
- ☆15Apr 4, 2021Updated 4 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- dbt / Amazon Redshift Demonstration Project☆34Jan 3, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Making the transition from Scratch to Python☆11Apr 11, 2017Updated 8 years ago
- Apache Spark 3 - Structured Streaming Course Material☆46Sep 8, 2020Updated 5 years ago
- A repository for the blog 'AWS lambda unit testing with Python'☆19Nov 11, 2018Updated 7 years ago
- ☆11Oct 13, 2025Updated 5 months ago
- Amazon EMR Serverless and Amazon MSK Serverless Demo☆13Jul 31, 2022Updated 3 years ago
- ☆12Aug 17, 2023Updated 2 years ago
- Quickly show esriJson, geoJson, or WKT on a map, or draw on a map to get esriJson, geoJson, or WKT☆20Aug 16, 2022Updated 3 years ago