gwenshap / lambda_s3_kafka
AWS Lambda function to get events in Kafka topic when files are uploaded to S3
☆24Updated 6 years ago
Alternatives and similar repositories for lambda_s3_kafka:
Users that are interested in lambda_s3_kafka are comparing it to the libraries listed below
- ☆47Updated 5 months ago
- ☆10Updated 6 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 3 weeks ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- Cloudbox Labs blog code☆35Updated 6 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 9 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- An example Apache Beam project.☆111Updated 7 years ago
- This project describes how to write full ETL data pipeline using spark.☆15Updated 2 years ago
- KSQL Step-by-step tutorial using the basic functions of Apache Kafka's Streaming SQL Engine☆10Updated 5 years ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Various Demos mostly based on docker environments☆34Updated 2 years ago
- Some AWS EMR examples☆16Updated 7 years ago
- Repository that showcases problems with Kafka rebalancing and explains how to fix them. Please visit our blog article to learn what Kafka…☆10Updated 4 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆48Updated 2 years ago
- These are some code examples☆55Updated 5 years ago
- Examples of using the DataStax Apache Kafka Connector.☆46Updated last year
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆103Updated last month
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆87Updated 10 months ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆49Updated last year
- Sample code to collect Apache Iceberg metrics for table monitoring☆23Updated 5 months ago
- Airflow workflow management platform chef cookbook.☆70Updated 5 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- Experiments and demonstrations of AVRO, Protobuf serialisation☆60Updated 2 years ago
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- AWS Big Data Certification☆25Updated 2 weeks ago