Create HTML profiling reports from Apache Spark DataFrames
β197Feb 2, 2020Updated 6 years ago
Alternatives and similar repositories for spark-df-profiling
Users that are interested in spark-df-profiling are comparing it to the libraries listed below
Sorting:
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.β102Aug 20, 2019Updated 6 years ago
- pyspark methods to enhance developer productivity π£ π― πβ684Mar 6, 2025Updated last year
- Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySparkβ1,540Dec 2, 2024Updated last year
- β11May 26, 2022Updated 3 years ago
- Movie Recommendation System Using Spark ML, Akka and Cassandraβ12Oct 4, 2019Updated 6 years ago
- Jupyter magics and kernels for working with remote Spark clustersβ1,362Sep 9, 2025Updated 6 months ago
- Create hadoop cluster in aws ec2 for developmentβ11Sep 8, 2017Updated 8 years ago
- The code for the Sales Dashboard demoβ16May 19, 2025Updated 9 months ago
- DataQuality for BigDataβ148Dec 15, 2023Updated 2 years ago
- β26Jul 9, 2023Updated 2 years ago
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,411Updated this week
- β15May 31, 2023Updated 2 years ago
- This code allows you to load any existing Azure Data Factory project file (*.dfproj) and perform further actions like "Export to ARM Tempβ¦β26May 5, 2019Updated 6 years ago
- Spark package for checking data qualityβ223Feb 28, 2020Updated 6 years ago
- The easiest way to integrate Kedro and Great Expectationsβ54Dec 26, 2022Updated 3 years ago
- Joins for skewed datasets in Sparkβ57Aug 18, 2017Updated 8 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,594Feb 17, 2026Updated 2 weeks ago
- Single view demoβ14Feb 13, 2016Updated 10 years ago
- python automatic data quality check toolkitβ278Sep 15, 2020Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocolβ34Sep 8, 2022Updated 3 years ago
- Joblib Apache Spark Backendβ249Apr 7, 2025Updated 11 months ago
- Examples of metadata driven SQL processes implemented in Databricksβ16May 21, 2021Updated 4 years ago
- CLI for data platformβ21Nov 12, 2025Updated 3 months ago
- SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.β17Apr 6, 2021Updated 4 years ago
- Flowchart for debugging Spark applicationsβ106Sep 25, 2024Updated last year
- β39Mar 4, 2019Updated 7 years ago
- Examples for High Performance Sparkβ16Oct 25, 2025Updated 4 months ago
- A Spark datasource for the HadoopOffice libraryβ36Sep 29, 2025Updated 5 months ago
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQLβ46Dec 14, 2025Updated 2 months ago
- Looking at big data? Add a little salt.β59May 18, 2023Updated 2 years ago
- Spark extensions for business contextsβ36Feb 19, 2020Updated 6 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)β454Feb 8, 2026Updated last month
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simpβ¦β816Updated this week
- A Tree Search Library for Data Cleaningβ22Feb 15, 2022Updated 4 years ago
- β23Jan 3, 2025Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.β73Mar 14, 2021Updated 4 years ago
- β19Mar 27, 2020Updated 5 years ago
- A Minimalistic Rust Implementation of Delta Sharing Server.β98Mar 17, 2025Updated 11 months ago
- Adds cross building functionality to Gradle for Scala based projectsβ19Aug 8, 2024Updated last year