lintool / my-data-is-bigger-than-your-data
My data is bigger than your data!
☆38Updated 5 years ago
Alternatives and similar repositories for my-data-is-bigger-than-your-data:
Users that are interested in my-data-is-bigger-than-your-data are comparing it to the libraries listed below
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- A template-based cluster provisioning system☆61Updated last year
- A Cascading Workflow Visualizer☆83Updated last year
- A/B experiments service☆33Updated this week
- ☆49Updated 7 years ago
- A java library for stored queries☆16Updated last year
- NuCypher for Kafka. Start building from this module (it fetches the appropriate branch from Kafka repository)☆18Updated 7 years ago
- Muppet☆126Updated 3 years ago
- Supporting material (code, schemas etc) for Unified Log Processing (Manning Publications)☆97Updated 2 years ago
- ☆43Updated 3 years ago
- Luigi Plugin for Hubot☆35Updated 8 years ago
- Embedded Kafka for testing and quick prototyping.☆14Updated 8 years ago
- dynamically parse protobuf message then convert to avro☆25Updated 9 years ago
- Query testing framework☆68Updated 2 months ago
- Java and Scala client libraries for Concord☆13Updated 8 years ago
- Cantor provides utilities for estimating the cardinality of large sets.☆83Updated 2 years ago
- Cascading on Apache Flink®☆54Updated last year
- Luigi Workflow Engine integration for Treasure Data☆16Updated 6 years ago
- demo clients☆20Updated 7 years ago
- Last-seen sketch implementation in Go☆16Updated 4 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- An example project for doing grid search in MLlib☆13Updated 10 years ago
- A framework to benchmark different graph databases, based on generated data from customizable schema, distribution, and size.☆26Updated 6 years ago
- Example code for building your own MemSQL Streamliner Pipelines☆23Updated 7 years ago
- Lossy Counting and Sticky Sampling implementation for efficient frequency counts on data streams.☆63Updated 8 years ago
- An analysis of adverse drug event data using Hadoop, R, and Gephi☆44Updated 9 years ago
- Time series analysis with Apache Spark based on Chronix |☆38Updated 7 years ago
- A collection of Scala graph libraries and adapters for graph databases.☆14Updated 8 years ago
- A scalable, distributed Time Series Database.☆28Updated 10 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 9 years ago