VishvendraRana / spark-custom-datasourceLinks
☆13Updated 8 years ago
Alternatives and similar repositories for spark-custom-datasource
Users that are interested in spark-custom-datasource are comparing it to the libraries listed below
Sorting:
- Flink performance tests☆20Updated 9 years ago
- Albis: High-Performance File Format for Big Data Systems☆21Updated 6 years ago
- ☆12Updated 8 years ago
- Parquet file generator☆22Updated 7 years ago
- Cascading on Apache Flink®☆54Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Stratosphere is now Apache Flink.☆197Updated last year
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.☆24Updated 8 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 4 years ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- Temporal_Graph_library☆25Updated 6 years ago
- Use cases built on SnappyData. Use cases contained here: 1. Ad Analytics 2. Streaming data ingestion from RabbitMQ.☆32Updated 2 years ago
- Apache Calcite Tutorial☆33Updated 9 years ago
- Port of TPC-H dbgen to Java☆50Updated 8 months ago
- A streaming key-value store implementation using native Flink Streaming operators☆23Updated 9 years ago
- Fluorite: Apache Calcite trace analyzer☆12Updated 6 years ago
- Provides a SQL interface to your TinkerPop enabled graph db☆74Updated 2 years ago
- Splittable Gzip codec for Hadoop☆70Updated last month
- A library for strong, schema based conversion between 'natural' JSON documents and Avro☆18Updated last year
- Sample Spark Streaming application for secure consumption from Kafka☆33Updated 8 years ago
- Scriptable scheduler for periodical Hadoop workflows☆22Updated 7 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆61Updated last year
- Paper: A Zero-rename committer for object stores☆20Updated 4 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Fast I/O plugins for Spark☆41Updated 4 years ago
- Port of TPC-DS dsdgen to Java☆50Updated 10 months ago
- Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.☆42Updated 7 years ago
- Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.☆27Updated 2 years ago
- Source code for SIMD benchmarks and experiments in Java☆32Updated 7 years ago