getyourguide / TypedPysparkLinks
Type-annotate your spark dataframes and validate them
☆14Updated 2 years ago
Alternatives and similar repositories for TypedPyspark
Users that are interested in TypedPyspark are comparing it to the libraries listed below
Sorting:
- PySpark phonetic and string matching algorithms☆41Updated last year
- Apache Avro <-> pandas DataFrame☆137Updated 5 months ago
- Annotation Management for Prodigy, that support multiple users working in many projects☆15Updated 7 years ago
- Functional Airflow DAG definitions.☆38Updated 8 years ago
- Package virtual environments for redistribution☆46Updated 3 years ago
- Import Databricks notebooks as libraries/modules☆15Updated 3 years ago
- Time everything in IPython☆126Updated 2 years ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆30Updated 3 years ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆142Updated last year
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆102Updated 6 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated last year
- SQLAlchemy dialect for EXASOL☆36Updated this week
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 7 years ago
- Record matching and entity resolution at scale in Spark☆36Updated 2 years ago
- Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Sear…☆86Updated 4 years ago
- Type System for Data Analysis in Python☆215Updated last year
- Python package for deduplication/entity resolution using active learning☆83Updated last year
- ☄️ Parallel and distributed training with spaCy and Ray☆56Updated 2 years ago
- A simple ElasticSearch plugin wrapping around the search endpoint to provide Rocchio query expansion☆17Updated 8 years ago
- Asynchronous actions for PySpark☆48Updated 4 years ago
- pytest plugin to run the tests with support of pyspark☆88Updated 8 months ago
- Pandas helper functions☆31Updated 2 years ago
- Pandas ExtensionDType/Array backed by Apache Arrow☆232Updated 2 years ago
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologies☆21Updated 3 years ago
- Read Delta tables without any Spark☆47Updated last year
- 🎯 kettle is a CLI tool for creating and deploying cloud functions & docker containers for machine learning☆31Updated 3 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆57Updated 7 months ago
- A Cython implementation of the affine gap string distance☆57Updated 3 years ago
- Marshmallow Schema generator for Pandas DataFrames☆24Updated 5 years ago
- Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can conta…☆24Updated 6 years ago