yahoo / validatar
Functional testing framework for Big Data pipelines.
☆58Updated last year
Alternatives and similar repositories for validatar:
Users that are interested in validatar are comparing it to the libraries listed below
- Splittable Gzip codec for Hadoop☆69Updated last week
- ☆63Updated 5 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- Schema Registry integration for Apache Spark☆39Updated 2 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆61Updated last year
- Library for organizing batch processing pipelines in Apache Spark☆41Updated 7 years ago
- functionstest☆33Updated 8 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 9 months ago
- DataQuality for BigData☆143Updated last year
- A utility for generating Oozie workflows from a YAML definition☆48Updated 5 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 5 years ago
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆131Updated 11 months ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- File compaction tool that runs on top of the Spark framework.☆59Updated 5 years ago
- A slightly moist lipstick-on-pig clone for Apache Hive☆23Updated last year
- Utilities for writing tests that use Apache Spark.☆24Updated 5 years ago
- Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…☆96Updated 5 years ago
- JUnit integration for testing the Apache Hive Metastore and HiveServer2 Thrift APIs☆25Updated 2 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…☆108Updated 6 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 2 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.☆129Updated 6 months ago
- Sketch adaptors for Hive.☆49Updated 2 months ago
- A super simple utility for testing Apache Hive scripts locally for non-Java developers.☆72Updated 7 years ago
- This is the example code repository for Getting Started with Impala by John Russell (O'Reilly Media)☆22Updated 7 years ago