MrPowers / ceja
PySpark phonetic and string matching algorithms
☆39Updated last year
Alternatives and similar repositories for ceja:
Users that are interested in ceja are comparing it to the libraries listed below
- Spark functions to run popular phonetic and string matching algorithms☆60Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Read Delta tables without any Spark☆47Updated last year
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- ☆15Updated 5 years ago
- Pandas helper functions☆30Updated 2 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- A simple introduction to using spark ml pipelines☆26Updated 6 years ago
- ☆16Updated last year
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Create HTML profiling reports from Apache Spark DataFrames☆195Updated 5 years ago
- Helpers & syntactic sugar for PySpark.☆61Updated last year
- ☆55Updated last year
- Projects developed by Domino's R&D team☆76Updated 2 years ago
- Delta Lake helper methods. No Spark dependency.☆22Updated 6 months ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 5 years ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated last year
- Delta lake and filesystem helper methods☆51Updated last year
- Fake Pandas / PySpark DataFrame creator☆45Updated last year
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 3 years ago
- A Table format agnostic data sharing framework☆38Updated last year
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- This repository contains NiFi processors for interacting with Snowflake Cloud Data Platform.☆12Updated 2 months ago
- Pylint plugin for static code analysis on Airflow code☆93Updated 4 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 3 months ago