FINRAOS / herd
Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
☆135Updated 2 years ago
Alternatives and similar repositories for herd:
Users that are interested in herd are comparing it to the libraries listed below
- Redshift Ops Console☆92Updated 9 years ago
- Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)☆47Updated last year
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- Apache Spark on AWS Lambda☆151Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆50Updated last year
- DataPipeline for humans.☆251Updated 2 years ago
- Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.☆71Updated last year
- kinesis-kafka-connector is connector based on Kafka Connect to publish messages to Amazon Kinesis streams or Amazon Kinesis Firehose.☆155Updated last year
- Amazon Redshift Advanced Monitoring☆272Updated 2 years ago
- An open-source, vendor-neutral data context service.☆159Updated 7 years ago
- Reference Architectures for Datalakes on AWS☆79Updated 4 years ago
- Simplify getting Zeppelin up and running☆56Updated 8 years ago
- Create Parquet files from CSV☆67Updated 7 years ago
- Amazon Elastic MapReduce code samples☆63Updated 9 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated last year
- Amazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.☆150Updated 3 years ago
- Bender - Serverless ETL Framework☆185Updated last year
- Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB☆223Updated last month
- Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.☆116Updated 2 years ago
- Tool to generate a Hive schema from a JSON example doc☆228Updated 5 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Updated last year
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆75Updated 6 years ago
- A tool for moving tables from Redshift to BigQuery☆65Updated 6 years ago
- Twitter Sentiment using Spark + MapRDB + Drill + ES + Kibana☆9Updated 7 years ago
- Vagrant files creating multi-node virtual Hadoop clusters with or without security.☆67Updated 4 years ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.☆42Updated last year
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆216Updated last month