twitter/hadoop-lzo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/twitter/hadoop-lzo)

twitter / hadoop-lzo

Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

☆548

Alternatives and similar repositories for hadoop-lzo

Users that are interested in hadoop-lzo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

twitter / elephant-bird
View on GitHub
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
☆1,134Apr 10, 2023Updated 3 years ago
toddlipcon / hadoop-lzo-packager
View on GitHub
Packaging utilities for GPL compression libraries in Hadoop
☆34Jun 7, 2012Updated 14 years ago
kevinweil / hadoop-lzo
View on GitHub
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
☆100Jan 10, 2012Updated 14 years ago
twitter-archive / elephant-twin-lzo
View on GitHub
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
☆15Jun 13, 2012Updated 14 years ago
traviscrawford / scribe
View on GitHub
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensib…
☆112May 17, 2011Updated 15 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
twitter-archive / elephant-twin
View on GitHub
Elephant Twin is a framework for creating indexes in Hadoop
☆99Oct 12, 2020Updated 5 years ago
YahooArchive / oozie
View on GitHub
Oozie - workflow engine for Hadoop
☆373Jun 8, 2017Updated 9 years ago
kevinweil / FileSetInputFormat
View on GitHub
A Hadoop input format for sending lists of files as keys to a mapper. Set the list of files, and an input split will be created per file…
☆16Apr 7, 2010Updated 16 years ago
toddlipcon / hadoop-lzo
View on GitHub
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
☆37Aug 13, 2012Updated 13 years ago
twitter-archive / hdfs-du
View on GitHub
Visualize your HDFS cluster usage
☆228Oct 13, 2020Updated 5 years ago
YahooArchive / howl
View on GitHub
Common metadata layer for Hadoop's Map Reduce, Pig, and Hive
☆77Feb 17, 2011Updated 15 years ago
twitter / hraven
View on GitHub
hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format
☆129Jan 14, 2022Updated 4 years ago
kevinweil / stream-to-hdfs
View on GitHub
A simple utility for streaming stdin to a file in HDFS
☆25Feb 4, 2010Updated 16 years ago
akkumar / hbasene
View on GitHub
HBase as the backing store for the TF-IDF representations for Lucene
☆110May 14, 2010Updated 16 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
toddlipcon / haatkit
View on GitHub
Toolkit of simple scripts useful for managing Hadoop
☆16May 3, 2012Updated 14 years ago
cloudera / flume
View on GitHub
WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for effici…
☆943May 26, 2021Updated 5 years ago
cwensel / cascading
View on GitHub
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
☆355Apr 8, 2025Updated last year
nathanmarz / elephantdb
View on GitHub
Distributed database specialized in exporting key/value data from Hadoop
☆558Jun 27, 2014Updated 12 years ago
twitter-archive / ambrose
View on GitHub
A platform for visualization and real-time monitoring of data workflows
☆1,170Jan 22, 2020Updated 6 years ago
ghelmling / beeno
View on GitHub
Simple Java Beans mapping for HBase
☆24Jul 11, 2012Updated 14 years ago
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
amplab / shark
View on GitHub
Development in Shark has been ended.
☆992Aug 11, 2015Updated 10 years ago
kevinweil / pig.tmbundle
View on GitHub
Simple syntax highlighting for writing Pig scripts (http://hadoop.apache.org/pig) in Textmate.
☆35May 2, 2013Updated 13 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookarchive / hadoop-20
View on GitHub
Facebook's Realtime Distributed FS based on Apache Hadoop 0.20-append
☆874Oct 10, 2014Updated 11 years ago
ThinkBigAnalytics / colossal-pipe
View on GitHub
The Colossal Pipe framework for map/reduce processing.
☆29Aug 19, 2014Updated 11 years ago
electrum / hadoop-snappy
View on GitHub
Snappy compression for Hadoop
☆41Jun 18, 2015Updated 11 years ago
klbostee / dumbo
View on GitHub
Python module that allows one to easily write and run Hadoop programs.
☆1,030Jan 9, 2018Updated 8 years ago
LinkedInAttic / white-elephant
View on GitHub
Hadoop log aggregator and dashboard
☆190Oct 29, 2013Updated 12 years ago
tdunning / Plume
View on GitHub
Explorations relative to cloning FlumeJava
☆94Oct 13, 2020Updated 5 years ago
apache / whirr
View on GitHub
Mirror of Apache Whirr
☆96Apr 28, 2017Updated 9 years ago
twitter-archive / scribe
View on GitHub
A Ruby client library for Scribe
☆90Mar 1, 2011Updated 15 years ago
twitter-archive / pycascading
View on GitHub
A Python wrapper for Cascading
☆220Dec 30, 2019Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
nathanmarz / storm
View on GitHub
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
☆8,772Aug 16, 2017Updated 8 years ago
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
LinkedInAttic / camus
View on GitHub
LinkedIn's previous generation Kafka to HDFS pipeline.
☆881Aug 27, 2020Updated 5 years ago
Karmasphere / lzo-java
View on GitHub
Pure Java implementation of the liblzo2 LZO compression algorithm
☆48Jan 20, 2012Updated 14 years ago
facebookarchive / scribe
View on GitHub
Scribe is a server for aggregating log data streamed in real time from a large number of servers.
☆3,912Aug 27, 2020Updated 5 years ago
anthonyu / Sizzle
View on GitHub
A compiler and runtime for Google's Sawzall language, optimized for Hadoop
☆41Apr 26, 2013Updated 13 years ago
larsgeorge / hbase-explorer
View on GitHub
Hue based HBase Explorer
☆25Dec 14, 2010Updated 15 years ago