LinkedInAttic/datafu

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LinkedInAttic/datafu)

LinkedInAttic / datafu

Hadoop library for large-scale data processing, now an Apache Incubator project

☆581

Alternatives and similar repositories for datafu

Users that are interested in datafu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

twitter / elephant-bird
View on GitHub
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
☆1,134Apr 10, 2023Updated 3 years ago
twitter-archive / ambrose
View on GitHub
A platform for visualization and real-time monitoring of data workflows
☆1,170Jan 22, 2020Updated 6 years ago
julienledem / Pig-scripting-examples
View on GitHub
Examples of use of pig scripting languages capabilities
☆39Aug 1, 2016Updated 9 years ago
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
LinkedInAttic / white-elephant
View on GitHub
Hadoop log aggregator and dashboard
☆190Oct 29, 2013Updated 12 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mesos / spark
View on GitHub
Lightning-fast cluster computing in Java, Scala and Python.
☆1,419Apr 8, 2014Updated 12 years ago
jeromatron / pygmalion
View on GitHub
A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.
☆84Aug 21, 2014Updated 11 years ago
jghoman / haivvreo
View on GitHub
Hive + Avro. Serde for working with Avro in Hive
☆60Dec 16, 2023Updated 2 years ago
alanfgates / programmingpig
View on GitHub
Data and example code for Programming Pig, by Alan F. Gates
☆186Oct 15, 2016Updated 9 years ago
nathanmarz / storm
View on GitHub
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
☆8,772Aug 16, 2017Updated 8 years ago
hbutani / SQLWindowing
View on GitHub
SQL Windowing Functions for Hadoop
☆65Jun 20, 2022Updated 4 years ago
YahooArchive / oozie
View on GitHub
Oozie - workflow engine for Hadoop
☆373Jun 8, 2017Updated 9 years ago
nathanmarz / elephantdb
View on GitHub
Distributed database specialized in exporting key/value data from Hadoop
☆558Jun 27, 2014Updated 12 years ago
rjurney / Cloud-Stenography
View on GitHub
Main Repo
☆15Jun 24, 2010Updated 16 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
amplab / shark
View on GitHub
Development in Shark has been ended.
☆992Aug 11, 2015Updated 10 years ago
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
twitter / summingbird
View on GitHub
Streaming MapReduce with Scalding and Storm
☆2,123Jan 19, 2022Updated 4 years ago
alienrobotwizard / sounder
View on GitHub
A grouping of Apache Pig examples.
☆65Oct 13, 2020Updated 5 years ago
sonalgoyal / crux
View on GitHub
Crux is a reporting application for HBase. Crux provides a simple web based graphical interface to access HBase, query data and create re…
☆100Apr 9, 2013Updated 13 years ago
Netflix / Lipstick
View on GitHub
Pig Visualization framework
☆466Mar 24, 2023Updated 3 years ago
LinkedInAttic / sensei
View on GitHub
distributed realtime searchable database
☆541Jun 20, 2014Updated 12 years ago
cloudera / emailarchive
View on GitHub
Hadoop for archiving email
☆23Sep 29, 2011Updated 14 years ago
cutting / trevni
View on GitHub
a column file format
☆133Sep 25, 2012Updated 13 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
forcedotcom / phoenix
View on GitHub
☆559Feb 12, 2022Updated 4 years ago
RevolutionAnalytics / RHadoop
View on GitHub
RHadoop
☆760Nov 24, 2015Updated 10 years ago
sonalgoyal / hiho
View on GitHub
Hadoop Data Integration with various databases, ftp servers, salesforce. Incremental update, dedup, append, merge your data on Hadoop.
☆92Apr 11, 2013Updated 13 years ago
cloudera / kitten
View on GitHub
The fast and fun way to write YARN applications.
☆136Nov 14, 2018Updated 7 years ago
cloudera / flume
View on GitHub
WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for effici…
☆943May 26, 2021Updated 5 years ago
jzachr / goldenorb
View on GitHub
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
☆293Jun 29, 2022Updated 4 years ago
cloudera / ades
View on GitHub
An analysis of adverse drug event data using Hadoop, R, and Gephi
☆44Jan 28, 2016Updated 10 years ago
infochimps-labs / wonderdog
View on GitHub
Bulk loading for elastic search
☆186Dec 16, 2023Updated 2 years ago
lintool / Cloud9
View on GitHub
Cloud9 is a Hadoop toolkit for working with big data
☆237Dec 15, 2015Updated 10 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mozilla-metrics / akela
View on GitHub
A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.
☆77Mar 31, 2014Updated 12 years ago
twitter / cassovary
View on GitHub
Cassovary is a simple big graph processing library for the JVM
☆1,053Oct 8, 2021Updated 4 years ago
addthis / stream-lib
View on GitHub
Stream summarizer and cardinality estimator.
☆2,265Nov 28, 2019Updated 6 years ago
cloudera / bigtop
View on GitHub
Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a …
☆51Jul 4, 2011Updated 15 years ago
datasalt / pangool
View on GitHub
Tuple MapReduce for Hadoop: Hadoop API made easy
☆57Jun 27, 2022Updated 4 years ago
twitter-archive / elephant-twin
View on GitHub
Elephant Twin is a framework for creating indexes in Hadoop
☆99Oct 12, 2020Updated 5 years ago
madlib / archived_madlib
View on GitHub
MADlib has moved to Apache MADlib (incubating). Please send pull requests to the Apache repository.
☆508Feb 9, 2018Updated 8 years ago