rheem-ecosystem / rheemLinks

Rheem - a cross-platform data processing system

☆5

Alternatives and similar repositories for rheem

Users that are interested in rheem are comparing it to the libraries listed below

Sorting:

qubole / rubix
Cache File System optimized for columnar formats and object stores
☆183Updated 2 years ago
qubole / quark
Quark is a data virtualization engine over analytic databases.
☆98Updated 8 years ago
lightcopy / parquet-index
Spark SQL index for Parquet tables
☆134Updated 4 years ago
maropu / spark-sql-server
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol
☆34Updated 2 years ago
michaelmior / calcite-notebooks
A series of Jupyter notebooks to demonstrate the functionality of Apache Calcite
☆59Updated 4 years ago
rymurr / flight-spark-source
☆106Updated 2 years ago
vasia / gelly-streaming
An experimental Graph Streaming API for Apache Flink
☆141Updated 4 years ago
TU-Berlin-DIMA / scotty-window-processor
This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.
☆78Updated last year
maropu / spark-tpcds-datagen
All the things about TPC-DS in Apache Spark
☆106Updated 2 years ago
apache / datasketches
Apache datasketches
☆97Updated 2 years ago
dataArtisans / yahoo-streaming-benchmark
An extension of Yahoo's Benchmarks
☆107Updated last year
vlsi / calcite-test-dataset
Data sets and Vagrant script to provision a virtual machine for Apache Calcite development
☆30Updated 2 years ago
TU-Berlin-DIMA / Condor
Condor allows for the specification of synopsis-based streaming jobs on top of general dataflow systems. Condor provides a collection of …
☆13Updated last year
apache / datasketches-website
Website for DataSketches.
☆103Updated 2 weeks ago
oap-project / sql-ds-cache
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
☆37Updated 2 years ago
dbs-leipzig / gradoop
Distributed Temporal Graph Analytics with Apache Flink
☆248Updated this week
verdict-project / verdict
Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing
☆250Updated 4 years ago
ehiggs / spark-terasort
Spark Terasort
☆121Updated 2 years ago
gyfora / StreamKV
A streaming key-value store implementation using native Flink Streaming operators
☆23Updated 9 years ago
substrait-io / substrait-java
☆86Updated this week
cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆91Updated 2 months ago
microsoft / Dhalion
Self regulation and auto-tuning for distributed system
☆65Updated 2 years ago
spirom / spark-data-sources
Developing Spark External Data Sources using the V2 API
☆48Updated 7 years ago
hydromatic / quidem
Idempotent query executor
☆52Updated 3 months ago
apache / datasketches-hive
Sketch adaptors for Hive.
☆50Updated 5 months ago
starburstdata / facebook-presto
Starburst Enterprise Distribution of Presto
☆45Updated 3 years ago
lsds / StreamBench
Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark
☆53Updated 6 years ago
squito / spark-memory
A tool to get better debug info on spark's memory usage
☆42Updated 5 years ago
criteo / babar
Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
☆127Updated 6 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago