gwenshap / lambda_s3_kafkaLinks
AWS Lambda function to get events in Kafka topic when files are uploaded to S3
☆24Updated 6 years ago
Alternatives and similar repositories for lambda_s3_kafka
Users that are interested in lambda_s3_kafka are comparing it to the libraries listed below
Sorting:
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Real-time anomaly detection using Kafka, KSQL User Defined Function and a pre-trained model☆30Updated last year
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated last year
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago
- ☆57Updated 10 months ago
- Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark☆13Updated 2 years ago
- An example PySpark project with pytest☆16Updated 7 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 6 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- ☆10Updated 6 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 9 months ago
- Examples of using the DataStax Apache Kafka Connector.☆46Updated last year
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 4 months ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆72Updated last year
- These are some code examples☆55Updated 5 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated last year
- This project describes how to write full ETL data pipeline using spark.☆15Updated 2 years ago
- Presto Trino with Apache Hive Postgres metastore☆41Updated 8 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆64Updated last year
- Code snippets used in demos recorded for the blog.☆37Updated last month
- Kafka Connect playground☆10Updated 5 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- Spark and Hive docker containers sharing a common MySQL metastore☆26Updated 5 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆75Updated 6 years ago
- This repository contains recipes for Apache Pinot.☆30Updated 3 months ago
- Interactive Notebooks that support the book☆40Updated 4 years ago