yahoo / validatar
Functional testing framework for Big Data pipelines.
☆56Updated last year
Alternatives and similar repositories for validatar:
Users that are interested in validatar are comparing it to the libraries listed below
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- A utility for generating Oozie workflows from a YAML definition☆48Updated 6 years ago
- A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.☆42Updated last year
- Interactive Audience Analytics with Spark and HyperLogLog☆55Updated 9 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- Oozie Workflow to Airflow DAGs migration tool☆87Updated last month
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆70Updated 2 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆50Updated last year
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…☆96Updated 5 years ago
- Cloud based Data Platform based on Apache Spark☆26Updated 2 months ago
- ☆61Updated 5 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Helpful user defined fuctions / table generating functions for Hive☆101Updated 8 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 7 years ago
- ☆63Updated 5 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- ☆16Updated 11 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- DataQuality for BigData☆144Updated last year
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Splittable Gzip codec for Hadoop☆70Updated last week
- SQL data model for working with Snowplow web data. Supports Redshift and Looker. Snowflake and BigQuery coming soon☆60Updated 4 years ago
- A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…☆109Updated 7 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 2 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 4 months ago