Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
☆174Mar 6, 2021Updated 4 years ago
Alternatives and similar repositories for spark-dynamodb
Users that are interested in spark-dynamodb are comparing it to the libraries listed below
Sorting:
- DynamoDB data source for Apache Spark☆95Sep 2, 2021Updated 4 years ago
- Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB☆228Jan 15, 2026Updated last month
- Single node, in-memory DataFrame analytics library.☆43Sep 15, 2025Updated 5 months ago
- Kinesis Connector for Structured Streaming☆138Jul 2, 2024Updated last year
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 3 months ago
- ☆24Oct 3, 2023Updated 2 years ago
- 🐋 Docker image for AWS Glue Spark/Python☆23Sep 5, 2023Updated 2 years ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Jun 15, 2023Updated 2 years ago
- Project to build WebLogic Domains with Oracle Fusion Middleware 12c components using scripts.☆12Jul 13, 2018Updated 7 years ago
- A Serverless function for posting to a Slack Webhook in response to a Mailgun route☆11Oct 12, 2016Updated 9 years ago
- Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies…☆22Jan 10, 2019Updated 7 years ago
- unix domain sockets that look just like tcp sockets☆11Jun 21, 2018Updated 7 years ago
- Implementation of the query event listener plugin in Java to log Presto statistics on Amazon EMR for auditing and performance insights☆13May 26, 2018Updated 7 years ago
- ☆18Nov 4, 2024Updated last year
- Serverless function to automate enforcement of Multi-Factor Authentication (MFA) to all AWS IAM users with access to AWS Management Conso…☆13Oct 30, 2018Updated 7 years ago
- A lightweight Scala DSL for system testing REST web services☆24Jun 19, 2014Updated 11 years ago
- Performant Redshift data source for Apache Spark☆140Jan 15, 2026Updated last month
- Explore the use of different patterns to produce clean code☆21Oct 11, 2014Updated 11 years ago
- Project to concentrate files and settings for AWS EMR monitoring. Source: https://aws.amazon.com/blogs/big-data/monitor-and-optimize-anal…☆15Oct 11, 2024Updated last year
- This repo demonstrates how to use AWS application auto-scaling to implement custom-scaling in your Kinesis Data Analytics for Apache Flin…☆19Feb 21, 2025Updated last year
- DynamoDB Local SBT plugin - NO LONGER MAINTAINED, SEE:☆14Sep 28, 2015Updated 10 years ago
- Simpler DynamoDB access for Scala☆318Dec 12, 2025Updated 2 months ago
- A library for Spark DataFrame using MinIO Select API☆101Sep 27, 2019Updated 6 years ago
- A Terraform module to create an Amazon Web Services (AWS) Elastic MapReduce (EMR) cluster.☆39Oct 21, 2019Updated 6 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆70Feb 21, 2024Updated 2 years ago
- A Spark library for Amazon SageMaker.☆301Mar 8, 2025Updated 11 months ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Sep 29, 2020Updated 5 years ago
- Reference Architectures for Datalakes on AWS☆78May 13, 2020Updated 5 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆77Oct 30, 2018Updated 7 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆816Updated this week
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 5 months ago
- spray based client for aws☆38May 4, 2016Updated 9 years ago
- An operator for running Pomerium on a Kubernetes cluster.☆27May 23, 2022Updated 3 years ago
- Edit code in IntelliJ, eval/run in Zeppelin notebook☆18Mar 17, 2019Updated 6 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- DynamoDBJournal for Akka Persistence☆85Oct 18, 2024Updated last year
- Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.☆21Jan 30, 2019Updated 7 years ago
- Gitbook Repo for Practical Data Pipeline☆25Feb 4, 2022Updated 4 years ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆227Mar 19, 2025Updated 11 months ago