echen / data-hacks
Command-line utilities for data analysis.
☆18Updated 14 years ago
Alternatives and similar repositories for data-hacks:
Users that are interested in data-hacks are comparing it to the libraries listed below
- HBase adapters for Cascading☆46Updated 15 years ago
- ☆33Updated 6 years ago
- It counts☆61Updated 12 years ago
- Cantor provides utilities for estimating the cardinality of large sets.☆83Updated 2 years ago
- A very memory-efficient trie (radix tree) implementation☆47Updated 12 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆76Updated 10 years ago
- aggregate composite metrics for cassandra using counters☆16Updated 13 years ago
- Sample code for Cascalog on Hadoop, a New Hope☆20Updated 11 years ago
- Realtime Analytics☆41Updated 12 years ago
- DDSL - Dynamic Distributed Service Locator☆102Updated 9 years ago
- recordbus: mysql binlog to apache kafka☆80Updated 9 years ago
- Explorations relative to cloning FlumeJava☆93Updated 4 years ago
- Collect local Mesos slave, underlying operating system and machine metrics and produce to Apache Kafka☆20Updated 9 years ago
- Originally for monthly table partitions, more info at [imperialwicket.com](http://imperialwicket.com/postgresql-automating-monthly-table-…☆43Updated 9 years ago
- Lucene based indexing in Cassandra☆61Updated 8 years ago
- A plugin for flume that allows you to use Cassandra as a sink.☆59Updated 13 years ago
- Realtime Analytics☆68Updated 12 years ago
- Tool to help users migrate large relational databases into Hadoop clusters.☆67Updated 12 years ago
- Redesign to eliminate all string identifiers and hide partitioning details from app developer.☆16Updated 13 years ago
- A toy school project intended to be an approximate clone of Google's Megastore database for geographically-distributed scalable fault-to…☆35Updated 13 years ago
- UNRELEASED. An opinionated framework for analytics-on-write on event streams using key-value storage☆14Updated 9 years ago
- Presto connector to Amazon Kinesis service.☆14Updated 5 years ago
- Unix tee, but for Kinesis streams☆12Updated 3 years ago
- A small Scala library for writing specs as simple classes and methods (no longer maintained).☆38Updated 7 years ago
- Safe daemonization from within Java☆74Updated 7 years ago
- Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.☆174Updated 12 years ago
- Open source framework for predictive modeling on Apache Hadoop☆34Updated 10 years ago
- A restful web application for real-time typeahead and autocomplete☆105Updated 12 years ago
- Apache Solr Client for Scala/Java☆51Updated 9 years ago
- Probabilistic data structures server. The data model is key-value, where values are: Bloomfilters, LinearCounters, HyperLogLogs, CountMin…☆25Updated 9 years ago