yahoo / validatarLinks
Functional testing framework for Big Data pipelines.
☆56Updated last year
Alternatives and similar repositories for validatar
Users that are interested in validatar are comparing it to the libraries listed below
Sorting:
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Wikipedia stream-processing demo using Kafka Connect and Kafka Streams.☆75Updated 7 years ago
- Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…☆96Updated 5 years ago
- Magic to help Spark pipelines upgrade☆35Updated 8 months ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆131Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- Splittable Gzip codec for Hadoop☆70Updated this week
- Dione - a Spark and HDFS indexing library☆52Updated last year
- A utility for generating Oozie workflows from a YAML definition☆48Updated 6 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 9 years ago
- An Open Source unit test framework for Hive queries based on JUnit 4 and 5☆257Updated 5 months ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆43Updated 2 weeks ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 6 years ago
- A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…☆109Updated 7 years ago
- A library for strong, schema based conversion between 'natural' JSON documents and Avro☆18Updated last year
- ☆63Updated 5 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 5 months ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆127Updated 6 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 2 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Spark cloud integration: tests, cloud committers and more☆19Updated 4 months ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆102Updated last year
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 4 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 9 months ago
- ☆61Updated 6 years ago
- Library for organizing batch processing pipelines in Apache Spark☆41Updated 8 years ago