stripe-archive/herringbone

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stripe-archive/herringbone)

stripe-archive / herringbone

Tools for working with parquet, impala, and hive

☆135

Alternatives and similar repositories for herringbone

Users that are interested in herringbone are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stripe-archive / timberlake
View on GitHub
Timberlake is a Job Tracker for Hadoop.
☆177Jan 24, 2020Updated 6 years ago
elodina / syscol
View on GitHub
Collect local Mesos slave, underlying operating system and machine metrics and produce to Apache Kafka
☆20Jan 29, 2016Updated 10 years ago
med-at-scale / high-health
View on GitHub
Integrate the GA4GH schemas and probably a scala impl of the service.
☆14May 20, 2016Updated 10 years ago
patrickangeles / cdh-maven-archetype
View on GitHub
Cloudera Maven Archetypes
☆18Sep 7, 2011Updated 14 years ago
avibryant / simmer
View on GitHub
Reduce your data. A unix filter for algebird-powered aggregation.
☆141Apr 17, 2017Updated 9 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
spotify / crunch-lib
View on GitHub
Useful reusable pipeline components for Crunch jobs
☆27Feb 10, 2015Updated 11 years ago
etsy / arbiter
View on GitHub
A utility for generating Oozie workflows from a YAML definition
☆50Mar 4, 2019Updated 7 years ago
laserson / impyla-old
View on GitHub
OLD - impyla now developed at `cloudera/impyla`
☆23Apr 16, 2014Updated 12 years ago
jatrost / hadoop-binary-analysis
View on GitHub
Framework that makes processing arbitrary binary data in Hadoop easier
☆22Apr 8, 2013Updated 13 years ago
ImpalaToGo / ImpalaToGo
View on GitHub
Fork of Cloudera Impala separated from Hadoop
☆42Jul 13, 2016Updated 10 years ago
laserson / dsq
View on GitHub
Distributed Streaming Quantiles (for PySpark)
☆38Jan 30, 2014Updated 12 years ago
twitter-archive / jaqen
View on GitHub
A type-safe heterogenous Map or a Named field Tuple
☆35Nov 8, 2014Updated 11 years ago
daithiocrualaoich / spark-emr
View on GitHub
Spark Elastic MapReduce bootstrap and runnable examples.
☆17Jun 26, 2013Updated 13 years ago
ianoc / SparkEMRBootstrap
View on GitHub
Files to help make new spark EMR Bootstraps
☆15Aug 4, 2013Updated 12 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
piersharding / dplyrimpaladb
View on GitHub
R dplyr connector for ImpalaDB
☆15Mar 1, 2017Updated 9 years ago
ottogroup / schedoscope
View on GitHub
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…
☆98Nov 14, 2019Updated 6 years ago
tresata / spark-sorted
View on GitHub
Secondary sort and streaming reduce for Apache Spark
☆77Jul 3, 2023Updated 3 years ago
kite-sdk / kite-examples
View on GitHub
Kite SDK Examples
☆99May 8, 2021Updated 5 years ago
hammerlab / immuno
View on GitHub
Use somatic mutations to choose a personalized cancer vaccine (tumor-specific immunogenic peptides)
☆16Sep 23, 2016Updated 9 years ago
Automattic / cm-livy-scripts
View on GitHub
Scripts for building Cloudera Manager parcel and CSD for Livy Spark Server
☆21Oct 18, 2017Updated 8 years ago
d2iq-archive / mesos-utils
View on GitHub
Utilities for building distributed systems on top of mesos
☆23Aug 25, 2018Updated 7 years ago
pinterest / terrapin
View on GitHub
Serving system for batch generated data sets
☆179May 11, 2017Updated 9 years ago
owainlewis / activator-akka-http
View on GitHub
A Typesafe Activator template for Akka HTTP microservices
☆13Jul 5, 2026Updated 2 weeks ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
tresata / ganitha
View on GitHub
scalding powered machine learning
☆109Nov 18, 2014Updated 11 years ago
nfergu / popstrat
View on GitHub
Population Stratification Analysis on Genomics Data Using Deep Learning
☆25Sep 5, 2016Updated 9 years ago
massie / spark-parquet-example
View on GitHub
Example project to show how to use Spark to read and write Avro/Parquet files
☆50Aug 21, 2013Updated 12 years ago
TAwarehouse / backup-hadoop-and-hive
View on GitHub
☆21May 9, 2012Updated 14 years ago
ogrodnek / spark-plug
View on GitHub
scala driver for launching Amazon EMR jobs
☆40Feb 10, 2016Updated 10 years ago
rkuhn / asynctest
View on GitHub
Minimalistic JUnit-style framework for testing asynchronous components
☆37Sep 6, 2014Updated 11 years ago
brightcove-archive / ooyala_scamr
View on GitHub
A Hadoop map reduce framework for Scala.
☆15Apr 21, 2016Updated 10 years ago
agourlay / omnibus
View on GitHub
An HTTP-friendly persistent message bus.
☆71Aug 22, 2015Updated 10 years ago
amplab / smash
View on GitHub
Benchmarking toolkit for variant calling
☆48Oct 13, 2020Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
theclaymethod / Foundry-vagrant-mesos-kafka-cluster
View on GitHub
A Vagrant/Ansible => Kafka, Mesos (w/ Marathon/Docker), ZK, Hadoop, and Spark. Service discovery via HAProxy and Bamboo.
☆50Dec 3, 2014Updated 11 years ago
stripe-archive / sequins
View on GitHub
A key/value store for serving static batch data
☆174Jul 14, 2023Updated 3 years ago
sequenceiq / periscope
View on GitHub
Periscope brings SLA policy based autoscaling to Hadoop
☆35Jan 25, 2016Updated 10 years ago
brightcove-archive / ooyala_spark-jobserver
View on GitHub
REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver…
☆345May 19, 2017Updated 9 years ago
AndreSchumacher / avro-parquet-spark-example
View on GitHub
An example of using Avro and Parquet in Spark SQL
☆60Nov 16, 2015Updated 10 years ago
tresata / spark-kafka
View on GitHub
Low level integration of Spark and Kafka
☆129Mar 15, 2018Updated 8 years ago
tresata / spark-columnar
View on GitHub
☆15Mar 4, 2015Updated 11 years ago