gwenshap / lambda_s3_kafka
AWS Lambda function to get events in Kafka topic when files are uploaded to S3
☆24Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for lambda_s3_kafka
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 11 months ago
- ☆10Updated 6 years ago
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- ☆43Updated 3 months ago
- A K8s-based infrastructure for analytics☆24Updated 4 years ago
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- An example Apache Beam project.☆111Updated 7 years ago
- Learn the Confluent Schema Registry & REST Proxy☆187Updated 8 months ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- These are some code examples☆55Updated 4 years ago
- Real-world Spark pipelines examples☆83Updated 6 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆60Updated 2 months ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Kafka Examples repository.☆43Updated 5 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated last year
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆33Updated 11 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Docker image to submit Spark applications☆38Updated 6 years ago
- Experiments and demonstrations of AVRO, Protobuf serialisation☆60Updated last year
- AWS Big Data Certification☆25Updated last year
- This project describes how to write full ETL data pipeline using spark.☆15Updated 2 years ago