MrPowers / ceja
PySpark phonetic and string matching algorithms
☆39Updated last year
Alternatives and similar repositories for ceja:
Users that are interested in ceja are comparing it to the libraries listed below
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Spark functions to run popular phonetic and string matching algorithms☆60Updated 3 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- ☆15Updated 5 years ago
- Read Delta tables without any Spark☆47Updated last year
- ☆16Updated 2 years ago
- Create HTML profiling reports from Apache Spark DataFrames☆196Updated 5 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Pandas helper functions☆30Updated 2 years ago
- Fake Pandas / PySpark DataFrame creator☆46Updated last year
- An example PySpark project with pytest☆16Updated 7 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Helpers & syntactic sugar for PySpark.☆62Updated last year
- Python API for Deequ☆41Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆103Updated 5 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- A simple introduction to using spark ml pipelines☆26Updated 7 years ago
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- How to evaluate the Quality of your Data with Great Expectations and Spark.☆31Updated 2 years ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated last year
- Delta lake and filesystem helper methods☆51Updated last year
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year