FINRAOS / herdLinks
Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
☆138Updated 3 years ago
Alternatives and similar repositories for herd
Users that are interested in herd are comparing it to the libraries listed below
Sorting:
- Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)☆47Updated last year
- Apache Spark on AWS Lambda☆154Updated 2 years ago
- DataPipeline for humans.☆249Updated 3 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆90Updated last year
- Tool to generate a Hive schema from a JSON example doc☆227Updated 5 years ago
- Amazon Elastic MapReduce code samples☆63Updated 10 years ago
- Bender - Serverless ETL Framework☆188Updated last year
- kinesis-kafka-connector is connector based on Kafka Connect to publish messages to Amazon Kinesis streams or Amazon Kinesis Firehose.☆157Updated last year
- Apache Spark AWS Lambda Executor (SAMBA)☆44Updated 7 years ago
- Simplify getting Zeppelin up and running☆56Updated 9 years ago
- Collection of tools for bootstrapping Apache Ambari & deploying clusters☆83Updated 6 years ago
- Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB☆228Updated 5 months ago
- Demonstrates NiFi template deployment and configuration via a REST API☆70Updated 8 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆72Updated last year
- Ferry lets you define, run, and deploy big data applications on AWS, OpenStack, and your local machine using Docker☆253Updated 10 years ago
- Redshift Ops Console☆92Updated 9 years ago
- Cloudbreak Deployer Tool☆34Updated 2 years ago
- Cloudera Director sample code☆61Updated 5 years ago
- DynamoDB data source for Apache Spark☆95Updated 4 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆260Updated last year
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆612Updated 2 years ago
- kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)☆95Updated 6 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆77Updated 6 years ago
- ☆205Updated 2 years ago
- Vagrant files creating multi-node virtual Hadoop clusters with or without security.☆67Updated 5 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆52Updated 3 months ago
- A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.☆42Updated last year
- [DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine☆109Updated 5 years ago