twitter/elephant-bird

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/twitter/elephant-bird)

twitter / elephant-bird

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

☆1,134

Alternatives and similar repositories for elephant-bird

Users that are interested in elephant-bird are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
twitter / hadoop-lzo
View on GitHub
Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
☆548Apr 24, 2024Updated 2 years ago
twitter-archive / ambrose
View on GitHub
A platform for visualization and real-time monitoring of data workflows
☆1,170Jan 22, 2020Updated 6 years ago
twitter / summingbird
View on GitHub
Streaming MapReduce with Scalding and Storm
☆2,123Jan 19, 2022Updated 4 years ago
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
YahooArchive / howl
View on GitHub
Common metadata layer for Hadoop's Map Reduce, Pig, and Hive
☆77Feb 17, 2011Updated 15 years ago
nathanmarz / storm
View on GitHub
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
☆8,770Aug 16, 2017Updated 8 years ago
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,523May 28, 2023Updated 3 years ago
YahooArchive / oozie
View on GitHub
Oozie - workflow engine for Hadoop
☆373Jun 8, 2017Updated 9 years ago
alienrobotwizard / sounder
View on GitHub
A grouping of Apache Pig examples.
☆65Oct 13, 2020Updated 5 years ago
amplab / shark
View on GitHub
Development in Shark has been ended.
☆992Aug 11, 2015Updated 10 years ago
cloudera / flume
View on GitHub
WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for effici…
☆943May 26, 2021Updated 5 years ago
mozilla-metrics / akela
View on GitHub
A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.
☆77Mar 31, 2014Updated 12 years ago
nathanmarz / elephantdb
View on GitHub
Distributed database specialized in exporting key/value data from Hadoop
☆558Jun 27, 2014Updated 12 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
iconara / piglet
View on GitHub
Piglet is a DSL for writing Pig scripts in Ruby
☆83Jul 21, 2010Updated 16 years ago
Cascading / pattern
View on GitHub
Machine Learning for Cascading
☆85Jun 12, 2015Updated 11 years ago
LinkedInAttic / camus
View on GitHub
LinkedIn's previous generation Kafka to HDFS pipeline.
☆881Aug 27, 2020Updated 5 years ago
mesos / spark
View on GitHub
Lightning-fast cluster computing in Java, Scala and Python.
☆1,419Apr 8, 2014Updated 12 years ago
infochimps-labs / wonderdog
View on GitHub
Bulk loading for elastic search
☆186Dec 16, 2023Updated 2 years ago
traviscrawford / scribe
View on GitHub
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensib…
☆112May 17, 2011Updated 15 years ago
nathanmarz / cascalog
View on GitHub
Data processing on Hadoop without the hassle.
☆1,373May 18, 2023Updated 3 years ago
LinkedInAttic / white-elephant
View on GitHub
Hadoop log aggregator and dashboard
☆190Oct 29, 2013Updated 12 years ago
alanfgates / programmingpig
View on GitHub
Data and example code for Programming Pig, by Alan F. Gates
☆186Oct 15, 2016Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
spullara / havrobase
View on GitHub
Use Avro to store all your values in HBase instead of regular columns
☆76Dec 1, 2017Updated 8 years ago
jghoman / haivvreo
View on GitHub
Hive + Avro. Serde for working with Avro in Hive
☆60Dec 16, 2023Updated 2 years ago
OpenTSDB / asynchbase
View on GitHub
A fully asynchronous, non-blocking, thread-safe, high-performance HBase client.
☆610May 19, 2023Updated 3 years ago
edwardcapriolo / filecrush
View on GitHub
Remedy small files by combining them into larger ones.
☆196Jul 1, 2022Updated 4 years ago
wilbur / Piggybank
View on GitHub
A reporistory of User-defined functions for Apache Pig
☆16Sep 20, 2010Updated 15 years ago
zohmg / zohmg
View on GitHub
Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.
☆173Oct 16, 2012Updated 13 years ago
akkumar / hbasene
View on GitHub
HBase as the backing store for the TF-IDF representations for Lucene
☆110May 14, 2010Updated 16 years ago
julienledem / Pig-scripting-examples
View on GitHub
Examples of use of pig scripting languages capabilities
☆39Aug 1, 2016Updated 9 years ago
jeromatron / pygmalion
View on GitHub
A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.
☆84Aug 21, 2014Updated 11 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lintool / Cloud9
View on GitHub
Cloud9 is a Hadoop toolkit for working with big data
☆237Dec 15, 2015Updated 10 years ago
twitter / algebird
View on GitHub
Abstract Algebra for Scala
☆2,299Nov 21, 2025Updated 8 months ago
addthis / stream-lib
View on GitHub
Stream summarizer and cardinality estimator.
☆2,265Nov 28, 2019Updated 6 years ago
twitter-archive / pycascading
View on GitHub
A Python wrapper for Cascading
☆220Dec 30, 2019Updated 6 years ago
jzachr / goldenorb
View on GitHub
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
☆293Jun 29, 2022Updated 4 years ago
klbostee / dumbo
View on GitHub
Python module that allows one to easily write and run Hadoop programs.
☆1,030Jan 9, 2018Updated 8 years ago
forcedotcom / phoenix
View on GitHub
☆559Feb 12, 2022Updated 4 years ago