gwenshap / lambda_s3_kafka
AWS Lambda function to get events in Kafka topic when files are uploaded to S3
☆24Updated 6 years ago
Alternatives and similar repositories for lambda_s3_kafka:
Users that are interested in lambda_s3_kafka are comparing it to the libraries listed below
- ☆52Updated 7 months ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 2 months ago
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- Magic to help Spark pipelines upgrade☆34Updated 5 months ago
- ☆10Updated 6 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated this week
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.☆62Updated last year
- Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark☆13Updated last year
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Code that was used as an example during the Data+AI Summit 2020☆15Updated 4 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Some AWS EMR examples☆16Updated 7 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- A facebook for data☆26Updated 5 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆50Updated last year
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 4 years ago
- Examples for High Performance Spark☆15Updated 4 months ago
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated last year
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 4 years ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆84Updated 2 years ago
- a curated list of awesome lakehouse frameworks, applications, etc☆23Updated 3 weeks ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆25Updated 7 months ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆48Updated 2 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆104Updated 3 months ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago