VishvendraRana / spark-custom-datasourceLinks

☆13

Alternatives and similar repositories for spark-custom-datasource

Users that are interested in spark-custom-datasource are comparing it to the libraries listed below

Sorting:

dataArtisans / performance
Flink performance tests
☆20Updated 9 years ago
zrlio / albis
Albis: High-Performance File Format for Big Data Systems
☆21Updated 6 years ago
conversant / spark-profiler
☆12Updated 8 years ago
zrlio / parquet-generator
Parquet file generator
☆22Updated 7 years ago
dataArtisans / cascading-flink
Cascading on Apache Flink®
☆54Updated last year
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆72Updated 4 years ago
stratosphere / stratosphere
Stratosphere is now Apache Flink.
☆197Updated last year
lightcopy / parquet-index
Spark SQL index for Parquet tables
☆134Updated 4 years ago
cerndb / Hadoop-Profiler
Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.
☆24Updated 8 years ago
ExpediaGroup / hiveberg
Demonstration of a Hive Input Format for Iceberg
☆26Updated 4 years ago
ibm-research-ireland / sparkoscope
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
☆47Updated 7 years ago
otherwise777 / Temporal_Graph_library
Temporal_Graph_library
☆25Updated 6 years ago
TIBCOSoftware / snappy-examples
Use cases built on SnappyData. Use cases contained here: 1. Ad Analytics 2. Streaming data ingestion from RabbitMQ.
☆32Updated 2 years ago
milinda / calcite-tutorial
Apache Calcite Tutorial
☆33Updated 9 years ago
trinodb / tpch
Port of TPC-H dbgen to Java
☆50Updated 8 months ago
gyfora / StreamKV
A streaming key-value store implementation using native Flink Streaming operators
☆23Updated 9 years ago
anha1 / fluorite
Fluorite: Apache Calcite trace analyzer
☆12Updated 6 years ago
twilmes / sql-gremlin
Provides a SQL interface to your TinkerPop enabled graph db
☆74Updated 2 years ago
nielsbasjes / splittablegzip
Splittable Gzip codec for Hadoop
☆70Updated last month
ExpediaGroup / jasvorno
A library for strong, schema based conversion between 'natural' JSON documents and Avro
☆18Updated last year
markgrover / spark-secure-kafka-app
Sample Spark Streaming application for secure consumption from Kafka
☆33Updated 8 years ago
collectivemedia / celos
Scriptable scheduler for periodical Hadoop workflows
☆22Updated 7 years ago
uber / uberscriptquery
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
☆61Updated last year
steveloughran / zero-rename-committer
Paper: A Zero-rename committer for object stores
☆20Updated 4 years ago
ExpediaGroup / circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
☆88Updated last year
zrlio / crail-spark-io
Fast I/O plugins for Spark
☆41Updated 4 years ago
Teradata / tpcds
Port of TPC-DS dsdgen to Java
☆50Updated 10 months ago
flipkart-incubator / spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
☆42Updated 7 years ago
peelframework / peel
Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.
☆27Updated 2 years ago
pnowojski / simd-blog
Source code for SIMD benchmarks and experiments in Java
☆32Updated 7 years ago