Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
☆139Oct 1, 2022Updated 3 years ago
Alternatives and similar repositories for herd
Users that are interested in herd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Herd-UI is a search and discovery tool for business and technical users. Everyone in your organization can use Herd-UI to browse and unde…☆16Oct 1, 2022Updated 3 years ago
- Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.☆15Jul 17, 2024Updated last year
- Hortonworks Data Platform Data Generation Tool☆13Nov 30, 2017Updated 8 years ago
- Apache Zeppelin Service for Apache Ambari Service. Installation and management of Zeppelin via Ambari.☆14Jan 23, 2016Updated 10 years ago
- An AWS SDK-backed FileSystem driver for Hadoop☆64Oct 13, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Gatekeeper is a self-serviced web application allowing users to make requests for temporary access to EC2 & RDS instances running in AWS …☆28Dec 16, 2023Updated 2 years ago
- Hadoop YARN & MapReduce Memory Calculator☆13Nov 9, 2015Updated 10 years ago
- Tutorials for Cascading, Lingual, Pattern and other projects☆18Aug 30, 2016Updated 9 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆18Jun 28, 2021Updated 4 years ago
- kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)☆95Apr 4, 2019Updated 7 years ago
- An example Spring Boot application demonstrating how to configure and bootstrap a Pivotal GemFire Server in a Spring context, JVM-based p…☆12May 11, 2018Updated 7 years ago
- Ambari Service definition for deploying R & RHadoop libraries☆18Aug 3, 2015Updated 10 years ago
- Dockerfile and artifacts for running a self-contained HDP 2.3 "cluster" in a docker container☆10Aug 30, 2016Updated 9 years ago
- Quickly deploy Hadoop with the help of Ansible and Apache Ambari☆38Jul 15, 2015Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Kerberos/SPNEGO custom realm for Elasticsearch Shield 2.0☆16Jan 19, 2018Updated 8 years ago
- ☆205May 23, 2023Updated 2 years ago
- Apache Drill Workshop☆19Apr 4, 2016Updated 10 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆77Mar 31, 2014Updated 12 years ago
- Last-seen sketch implementation in Go☆16Dec 15, 2020Updated 5 years ago
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆22Feb 6, 2017Updated 9 years ago
- Ceph on Mesos☆20Apr 8, 2017Updated 9 years ago
- docker image to deploy rabbitmq cluster on mesos with one marathon app☆10Oct 12, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Simplify getting Zeppelin up and running☆56Jul 20, 2016Updated 9 years ago
- Ambari service for Apache Drill☆17Apr 15, 2016Updated 10 years ago
- Automated solution to copy and obfuscate production data to target environments in AWS☆25May 22, 2023Updated 2 years ago
- Example static schema registry for Iglu☆15Jun 21, 2023Updated 2 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- Developing Spark External Data Sources using the V2 API☆50Apr 29, 2018Updated 8 years ago
- Aphelion is a web application that captures and visualizes your AWS services usage limits. It continuously collects data in the backgroun…☆34Mar 31, 2021Updated 5 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Reusable code for Hive☆16Aug 19, 2014Updated 11 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- List of some interesting projects☆32Dec 24, 2019Updated 6 years ago
- Erlang Kinesis Client☆38Aug 8, 2025Updated 8 months ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆30Jun 8, 2016Updated 9 years ago
- Cloudbreak Deployer Tool☆34Jun 29, 2023Updated 2 years ago
- Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components☆10Oct 11, 2019Updated 6 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆93Mar 5, 2024Updated 2 years ago
- Functional, Typesafe, Declarative Data Pipelines☆140Jan 29, 2018Updated 8 years ago