FINRAOS / herd
Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
☆135Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for herd
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- Apache Spark on AWS Lambda☆151Updated last year
- Amazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.☆151Updated 3 years ago
- An opinionated auto-deployer for the Hortonworks Platform☆34Updated 3 years ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆33Updated 11 months ago
- Cloudera Director sample code☆60Updated 5 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- An open-source, vendor-neutral data context service.☆159Updated 6 years ago
- Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)☆47Updated 11 months ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆70Updated 9 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- kinesis-kafka-connector is connector based on Kafka Connect to publish messages to Amazon Kinesis streams or Amazon Kinesis Firehose.☆153Updated last year
- Redshift Ops Console☆92Updated 9 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 7 months ago
- Cloudformation templates for deploying Airflow in ECS☆40Updated 5 years ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆205Updated 6 months ago
- Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB☆217Updated 2 weeks ago
- Demonstrates NiFi template deployment and configuration via a REST API☆68Updated 7 years ago
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆83Updated last year
- Vagrant files creating multi-node virtual Hadoop clusters with or without security.☆67Updated 4 years ago
- Reference Architectures for Datalakes on AWS☆79Updated 4 years ago
- A tool for moving tables from Redshift to BigQuery☆65Updated 5 years ago
- This repository is to help with the Partner Demonstration of the Apache Atlas project.☆30Updated 9 years ago
- Amazon Redshift Advanced Monitoring☆268Updated last year
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 4 years ago
- Bender - Serverless ETL Framework☆186Updated 11 months ago
- Kinesis spout for Storm☆106Updated 6 years ago