FINRAOS / herd
Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
☆135Updated 2 years ago
Alternatives and similar repositories for herd:
Users that are interested in herd are comparing it to the libraries listed below
- Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB☆224Updated this week
- Apache Spark on AWS Lambda☆151Updated 2 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆75Updated 6 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆71Updated last year
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆220Updated last month
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆84Updated 2 years ago
- Reference Architectures for Datalakes on AWS☆79Updated 4 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆51Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Updated last year
- Bender - Serverless ETL Framework☆185Updated last year
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆64Updated last year
- kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)☆95Updated 6 years ago
- kinesis-kafka-connector is connector based on Kafka Connect to publish messages to Amazon Kinesis streams or Amazon Kinesis Firehose.☆155Updated last year
- This repository is to help with the Partner Demonstration of the Apache Atlas project.☆30Updated 9 years ago
- DataPipeline for humans.☆251Updated 2 years ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Sample Apache Flink application that can be deployed to Kinesis Analytics for Java. It reads taxi events from a Kinesis data stream, proc…☆85Updated last year
- Cloudformation and SQL scripts used to replicate a POC environment from the "Data Lake to Data Warehouse: Enhancing Customer 360 with Ama…☆31Updated 5 years ago
- Sample Apache Beam pipeline that can be deployed to Amazon Managed Service for Apache Flink. It reads taxi events from a Kinesis data str…☆47Updated last year
- Redshift Ops Console☆92Updated 9 years ago
- Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)☆47Updated last year
- Ferry lets you define, run, and deploy big data applications on AWS, OpenStack, and your local machine using Docker☆253Updated 9 years ago
- Tool to generate a Hive schema from a JSON example doc☆228Updated 5 years ago
- Amazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.☆150Updated 4 years ago
- A tool for moving tables from Redshift to BigQuery☆65Updated 6 years ago
- A solution describing data-processing design pattern for streaming data through Kinesis and Spark Streaming at real-time.☆38Updated 10 months ago
- ☆24Updated last year
- File compaction tool that runs on top of the Spark framework.☆59Updated 6 years ago
- s3mper - Consistent Listing for S3☆228Updated 2 years ago