google / zetasketchLinks
A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms. Currently: HyperLogLog++; more to come.
☆156Updated 2 weeks ago
Alternatives and similar repositories for zetasketch
Users that are interested in zetasketch are comparing it to the libraries listed below
Sorting:
- Apache datasketches☆95Updated 2 years ago
- DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees.☆120Updated last month
- Cache File System optimized for columnar formats and object stores☆182Updated 2 years ago
- Union, intersection, and set cardinality in loglog space☆57Updated last year
- GCS support for avro-tools, parquet-tools and protobuf☆75Updated last month
- Collection of utilities to allow writing java code that operates across a wide range of avro versions.☆79Updated 3 weeks ago
- Hadoop output committers for S3☆109Updated 4 years ago
- DBeam exports SQL tables into Avro files using JDBC and Apache Beam☆195Updated this week
- High performance native memory access for Java.☆125Updated 3 weeks ago
- Harry for Apache Cassandra®☆54Updated 9 months ago
- Cantor provides utilities for estimating the cardinality of large sets.☆83Updated 3 years ago
- Graph Analytics with Apache Kafka☆104Updated this week
- A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.☆130Updated 4 months ago
- An ORC File Scheme for the Cascading data processing platform.☆14Updated 3 years ago
- Sketch adaptors for Hive.☆50Updated 3 months ago
- Splittable Gzip codec for Hadoop☆70Updated 3 weeks ago
- ☆105Updated last year
- Apache Beam Site☆29Updated 2 weeks ago
- Idempotent query executor☆51Updated last month
- This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.☆77Updated last year
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch proce…☆98Updated last year
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 6 months ago
- Self regulation and auto-tuning for distributed system☆65Updated last year
- A Scalable Concurrent Key-Value Map for Big Data Analytics☆270Updated last year
- Google Dataflow Runner for Apache Flink™ (deprecated; please use the up-to-date Beam Runner)☆88Updated 8 years ago
- ☆84Updated last week
- Website for DataSketches.☆102Updated this week
- Enabling queries on compressed data.☆279Updated last year
- HyperLogLog (original and hyperloglog++) algorithm implementation in java.☆81Updated 4 years ago