Data quality control tool built on spark and deequ
☆25Jan 22, 2026Updated last month
Alternatives and similar repositories for data-flare
Users that are interested in data-flare are comparing it to the libraries listed below
Sorting:
- Some Avro operations in Scala☆10Feb 9, 2026Updated 2 weeks ago
- A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.☆13Oct 27, 2021Updated 4 years ago
- Some random how-to examples relating to Databricks.☆15Nov 3, 2021Updated 4 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Jan 22, 2024Updated 2 years ago
- End-to-end Machine Learning Pipeline demo using Delta Lake, MLflow and AzureML in Azure Databricks☆18Nov 9, 2019Updated 6 years ago
- Azure AI Camp - 2 day workshop on Databricks and Azure ML☆20Jul 23, 2023Updated 2 years ago
- Deriving Spark DataFrame schemas from case classes☆44Jun 24, 2024Updated last year
- MLOps Lab Example using PyTorch to predict Yelp Reviews☆21Mar 20, 2021Updated 4 years ago
- Code Repository for Talk: Managing an Akka Cluster on Kubernetes☆23Jul 11, 2019Updated 6 years ago
- An example of building kubernetes operator (Flink) using Abstract operator's framework☆26Jul 12, 2019Updated 6 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- ☆38May 22, 2024Updated last year
- Django with Data Science [Video], published by Packt☆12Dec 15, 2025Updated 2 months ago
- Scala framework for collecting performance metrics and conducting sound experimental benchmarking.☆13Nov 19, 2025Updated 3 months ago
- FederatedCatalog☆11Updated this week
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- My Study guide used to pass the CRT020 Spark Certification exam☆34Jan 6, 2020Updated 6 years ago
- I'll munch some data here☆12Jun 18, 2021Updated 4 years ago
- Python library & CLI to create, view and edit PFB files☆12Feb 19, 2026Updated last week
- Apache Spark based framework for analysis A/B experiments☆15Nov 3, 2024Updated last year
- GRASS GIS module for wildfire simulation wrapping r.ros and r.spread modules☆11Dec 13, 2021Updated 4 years ago
- A command-line tool to dynamically provision and manage Mesos clusters and their applications☆35Oct 11, 2016Updated 9 years ago
- DataQuality for BigData☆148Dec 15, 2023Updated 2 years ago
- A Scala library for locality sensitive hashing☆14Aug 1, 2018Updated 7 years ago
- Sangria akka-streams integration☆11Feb 8, 2026Updated 3 weeks ago
- Azure-Sentinel-BYOML☆12Nov 8, 2019Updated 6 years ago
- Pytorch implementation of Nueral Style transfer☆10Jun 22, 2021Updated 4 years ago
- ☆11Mar 27, 2024Updated last year
- Scalable genomic analysis pipelines, written in WDL☆11Updated this week
- Hadoop/Hive/Spark container to perform CI tests☆10Dec 26, 2020Updated 5 years ago
- ☆34Jan 19, 2026Updated last month
- A single source of truth for data definitions☆11Dec 10, 2022Updated 3 years ago
- This is a list of YAML file examples for Docker, Kubernetes, Ansible. Also includes a Python script.☆10Jan 12, 2021Updated 5 years ago
- Gene Prediction using MAKER, CEGMA, SNAP, GENEMARK & AUGUSTUS☆10Jul 20, 2017Updated 8 years ago
- Facilitates collaboration and governance for all participants in a Data Space.☆13Updated this week
- Simple videoconferencing service created using Twilio's Programmable Video Group Rooms API☆10May 24, 2018Updated 7 years ago
- A clean online résumé (CV)☆13Jun 6, 2024Updated last year
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago