ExpediaGroup / corcLinks
An ORC File Scheme for the Cascading data processing platform.
☆14Updated 3 years ago
Alternatives and similar repositories for corc
Users that are interested in corc are comparing it to the libraries listed below
Sorting:
- A unit testing framework for the Cascading data processing platform.☆25Updated 3 years ago
- A library to expose more of Apache Spark's metrics system☆146Updated 5 years ago
- A library for strong, schema based conversion between 'natural' JSON documents and Avro☆18Updated last year
- Hadoop output committers for S3☆109Updated 4 years ago
- Collection of utilities to allow writing java code that operates across a wide range of avro versions.☆79Updated 3 weeks ago
- Measure behavior of Java applications☆42Updated 3 years ago
- Fast Apache Avro serialization/deserialization library☆44Updated 4 years ago
- High performance native memory access for Java.☆125Updated 3 weeks ago
- A user friendly API for checking for and reporting on Avro schema incompatibilities.☆59Updated last year
- Big Data Toolkit for the JVM☆146Updated 4 years ago
- Probabilistic data structures for Guava.☆54Updated 4 years ago
- ☆13Updated 7 years ago
- DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees.☆120Updated last month
- ☆105Updated last year
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆127Updated 6 years ago
- The Schema Repo is a RESTful web service for storing and serving mappings between schema identifiers and schema definitions.☆156Updated 2 years ago
- XPath likeness for Avro☆35Updated 2 years ago
- A native Kafka protocol proxy for Apache Kafka☆21Updated 7 years ago
- Custom state store providers for Apache Spark☆92Updated 3 months ago
- Scala and SQL happy together.☆29Updated 8 years ago
- Sketch adaptors for Hive.☆50Updated 3 months ago
- Utilities for processing Flink checkpoints/savepoints☆74Updated 5 years ago
- ☆172Updated 3 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Bloomfilter support for Facebook Presto (prestodb.io)☆25Updated 2 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆29Updated 8 years ago
- A lightweight workflow definition library☆153Updated 2 years ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago