type-class based data cleansing library for Apache Spark SQL
β78Jun 23, 2019Updated 6 years ago
Alternatives and similar repositories for cleanframes
Users that are interested in cleanframes are comparing it to the libraries listed below
Sorting:
- Essential Spark extensions and helper methods β¨π²β766Sep 14, 2025Updated 5 months ago
- Spark data profiling utilitiesβ23Nov 24, 2018Updated 7 years ago
- Expressive types for Spark.β896Updated this week
- Spark package for checking data qualityβ223Feb 28, 2020Updated 5 years ago
- Better bridge apache spark and postgresqlβ23Sep 11, 2023Updated 2 years ago
- A tool to validate data, built around Apache Spark.β100Feb 19, 2026Updated last week
- Big Data Toolkit for the JVMβ146Nov 4, 2020Updated 5 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Sparkβ’ and big data workloads.β432Jan 14, 2022Updated 4 years ago
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shouβ¦β10Jul 31, 2023Updated 2 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are inβ¦β94May 9, 2025Updated 9 months ago
- Sample Spark Codeβ91Sep 19, 2018Updated 7 years ago
- Qubole Sparklens tool for performance tuning Apache Sparkβ590Jun 26, 2024Updated last year
- A benchmark tool for lakehouses.β14Mar 12, 2023Updated 2 years ago
- Writing application logic for Spark jobs that can be unit-tested without a SparkContextβ76Jan 27, 2019Updated 7 years ago
- SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.β46Jul 8, 2018Updated 7 years ago
- β45Apr 27, 2020Updated 5 years ago
- Deriving Spark DataFrame schemas from case classesβ44Jun 24, 2024Updated last year
- Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.β300Jul 13, 2025Updated 7 months ago
- Dashboards to monitor your open source organization's healthβ11Dec 19, 2019Updated 6 years ago
- Gives TreeLog a GUI, the ScalaJS ReactTreeViewβ10Jun 23, 2016Updated 9 years ago
- hive-phoenix-handler is a hive plug-in that can access Apache Phoenix table on HBase using HiveQL.β10Aug 17, 2017Updated 8 years ago
- My experiments improving Scala's Future for Scala 2.12 and beyondβ34Mar 26, 2018Updated 7 years ago
- The missing MatPlotLib for Scala + Sparkβ731Jan 30, 2022Updated 4 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,583Feb 17, 2026Updated last week
- Optics for Spark DataFramesβ47Mar 5, 2021Updated 4 years ago
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.β47Aug 1, 2016Updated 9 years ago
- Herd-UI is a search and discovery tool for business and technical users. Everyone in your organization can use Herd-UI to browse and undeβ¦β16Oct 1, 2022Updated 3 years ago
- Run spark calculations from Ammoniteβ117Feb 20, 2026Updated last week
- Reduce memory usage by running multiple applications in the same JVM.β13Jul 11, 2019Updated 6 years ago
- Instructions and examples for installing CNTK on an HDInsight cluster and running CNTK-Pyspark applications from Jupyter notebooks.β13Jul 26, 2018Updated 7 years ago
- Ultra-high-performance local IPC framework with Zipkin tracing to conduct a beautiful symphony of (brotherhood) build tooling.β10Jan 8, 2021Updated 5 years ago
- An sbt plugin to resolve dependencies using Aetherβ13Apr 10, 2025Updated 10 months ago
- A simplified, lightweight ETL Framework based on Apache Sparkβ587Jan 24, 2024Updated 2 years ago
- Hadoop output committers for S3β113Jul 9, 2020Updated 5 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Sparkβ29Nov 4, 2024Updated last year
- A tool for data sampling, data generation, and data diffingβ345Jan 8, 2026Updated last month
- A library that brings useful functions from various modern database management systems to Apache Sparkβ61Sep 4, 2023Updated 2 years ago
- Model complex data transformation pipelines easilyβ45Sep 23, 2022Updated 3 years ago
- Spark package to "plug" holes in data using SQL based rules β‘οΈ πβ29May 15, 2020Updated 5 years ago