DataQuality for BigData
☆148Dec 15, 2023Updated 2 years ago
Alternatives and similar repositories for DataQuality
Users that are interested in DataQuality are comparing it to the libraries listed below
Sorting:
- Spark package for checking data quality☆223Feb 28, 2020Updated 6 years ago
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆31Oct 28, 2025Updated 4 months ago
- Yet Another SPark Framework☆10Feb 5, 2023Updated 3 years ago
- ☆38May 22, 2024Updated last year
- The premier open source Data Quality solution☆647Dec 19, 2025Updated 2 months ago
- Tutorial and examples of Data Quality in Big Data System☆11Apr 25, 2017Updated 8 years ago
- Avro Schema Evolution made easy☆36Feb 8, 2024Updated 2 years ago
- A Spark datasource for the HadoopOffice library☆36Sep 29, 2025Updated 5 months ago
- Data quality control tool built on spark and deequ☆25Jan 22, 2026Updated last month
- Tool to automate data quality checks on data pipelines☆256Sep 10, 2022Updated 3 years ago
- ☆21Aug 7, 2025Updated 6 months ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆72Jan 1, 2023Updated 3 years ago
- Automated Continuous Data Quality Measurement☆12Nov 15, 2023Updated 2 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Jan 27, 2025Updated last year
- Data Lineage Tracking And Visualization Solution☆656Feb 16, 2026Updated last week
- spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。☆18Jul 19, 2023Updated 2 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,583Feb 17, 2026Updated last week
- HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)☆62Sep 29, 2025Updated 5 months ago
- Mirror of Apache griffin☆1,174Aug 3, 2025Updated 6 months ago
- Witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. The Starter Kit showcases the …☆26Jan 27, 2026Updated last month
- The code repository for the "Pulsar in Action" book by Manning press☆48Jun 27, 2025Updated 8 months ago
- Practical utilities for spark applications☆11Jan 16, 2024Updated 2 years ago
- Mirror of Apache Beam☆10Jan 27, 2021Updated 5 years ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of o…☆11Feb 4, 2016Updated 10 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Jun 1, 2015Updated 10 years ago
- Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies…☆1,113Jan 12, 2023Updated 3 years ago
- Data Engineering with Scala, published by Packt☆28Feb 5, 2024Updated 2 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- Custom Alerts for Ambari server☆12Jul 27, 2015Updated 10 years ago
- A compendium of data projects and associated blog posts☆10Nov 4, 2019Updated 6 years ago
- Sketching data structures for scala, including t-digest☆15Sep 7, 2021Updated 4 years ago
- Azure Synapse Analytics Samples☆14Feb 15, 2023Updated 3 years ago
- ☆12Updated this week
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 5 months ago
- Convert a CSV fle to ORCFile☆26Apr 10, 2019Updated 6 years ago
- A project to create a stub/mock environment for testing ExecuteScript processors☆31Aug 10, 2018Updated 7 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆454Feb 8, 2026Updated 2 weeks ago
- Log tailing and parsing framework in Java☆28Nov 14, 2014Updated 11 years ago
- Herd-UI is a search and discovery tool for business and technical users. Everyone in your organization can use Herd-UI to browse and unde…☆16Oct 1, 2022Updated 3 years ago