gwenshap / lambda_s3_kafkaLinks
AWS Lambda function to get events in Kafka topic when files are uploaded to S3
☆24Updated 7 years ago
Alternatives and similar repositories for lambda_s3_kafka
Users that are interested in lambda_s3_kafka are comparing it to the libraries listed below
Sorting:
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 3 years ago
- ☆65Updated last year
- Airflow training for the crunch conf☆105Updated 7 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆48Updated last year
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆77Updated 7 years ago
- Airflow Unit Tests and Integration Tests☆261Updated 3 years ago
- ☆110Updated last year
- Fully reproducible, Dockerized, step-by-step, demo on how to stream tables from Postgres to Kafka/KSQL back to Postgres. Detailed blog p…☆152Updated 4 years ago
- Apache Spark on AWS Lambda☆157Updated 3 years ago
- How to build an awesome data engineering team☆101Updated 6 years ago
- Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker…☆84Updated 3 years ago
- ☆201Updated 2 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆175Updated 8 months ago
- Supporting repository for the blog post at https://medium.com/@stephane.maarek/how-to-use-apache-kafka-to-transform-a-batch-pipeline-into…☆247Updated 2 years ago
- Performant Redshift data source for Apache Spark☆141Updated 3 weeks ago
- ☆248Updated 6 years ago
- Example code for running Spark and Hive jobs on EMR Serverless.☆168Updated last year
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Updated 2 years ago
- Learn the Confluent Schema Registry & REST Proxy☆196Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 3 years ago
- Experiments and demonstrations of AVRO, Protobuf serialisation☆61Updated 3 years ago
- This repository contains recipes for Apache Pinot.☆32Updated 11 months ago
- Use Airflow to move data from multiple MySQL databases to BigQuery☆100Updated 5 years ago
- Benchmark data warehouses under Fivetran-like conditions☆171Updated 3 years ago
- Read Delta tables without any Spark☆47Updated last year
- Example DAGs using hooks and operators from Airflow Plugins☆348Updated 7 years ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆67Updated 4 years ago
- ☆44Updated 2 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 3 years ago
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆110Updated this week