datastax / sstable-to-arrow
☆35Updated last year
Related projects ⓘ
Alternatives and complementary repositories for sstable-to-arrow
- ☆104Updated last year
- ☆77Updated this week
- Multi-hop declarative data pipelines☆91Updated 2 weeks ago
- Distributed System Testing as a Service☆51Updated 9 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- Apache datasketches☆88Updated last year
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆37Updated 2 months ago
- The open source, pluggable, nosql benchmarking suite.☆172Updated this week
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 2 years ago
- Condor allows for the specification of synopsis-based streaming jobs on top of general dataflow systems. Condor provides a collection of …☆13Updated 5 months ago
- This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.☆75Updated last year
- Dione - a Spark and HDFS indexing library☆50Updated 8 months ago
- A composable framework for fast and scalable data analytics☆57Updated last year
- Point-in-Time optimizations for Apache Spark☆29Updated 10 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆56Updated last year
- Idempotent query executor☆50Updated 10 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated 3 weeks ago
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆69Updated this week
- A dual write proxy for Apache Cassandra☆24Updated 2 years ago
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated 8 months ago
- Redset is a dataset containing three months worth of user query metadata that ran on a selected sample of instances in the Amazon Redshif…☆45Updated 2 months ago
- Spark-Cassandra Bulk Reader CASSANDRA-16222☆22Updated last year
- Java binding to Apache DataFusion☆70Updated this week
- Distributed tests for Apache Cassandra®☆54Updated this week
- Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.☆27Updated 2 years ago
- Apache Pulsar - distributed pub-sub messaging system☆15Updated this week
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- Union, intersection, and set cardinality in loglog space☆54Updated last year
- Data Sketches for Apache Spark☆21Updated last year
- Example: Convert Protobuf to Parquet using parquet-avro and avro-protobuf☆29Updated 9 years ago