kaiko-ai / typedsparkLinks

Column-wise type annotations for pyspark DataFrames

☆88

Alternatives and similar repositories for typedspark

Users that are interested in typedspark are comparing it to the libraries listed below

Sorting:

G-Research / spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
☆231Updated 3 months ago
delta-io / kafka-delta-ingest
A highly efficient daemon for streaming data from Kafka into Delta Lake
☆414Updated 5 months ago
mitchelllisle / sparkdantic
✨ A Pydantic to PySpark schema library
☆108Updated last week
mrpowers-io / jodie
Delta lake and filesystem helper methods
☆51Updated last year
Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆190Updated this week
tekumara / fakesnow
Run, mock and test fake Snowflake databases locally.
☆153Updated 2 weeks ago
Nike-Inc / brickflow
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
☆218Updated 2 weeks ago
dataflint / spark
Drop-in replacement for Apache Spark UI
☆323Updated 2 weeks ago
MrPowers / mack
Delta Lake helper methods in PySpark
☆323Updated last year
ananthdurai / schemata
Schema modelling framework for decentralised domain-driven ownership of data.
☆259Updated last year
benchsci / tinsel
PySpark schema generator
☆43Updated 2 years ago
mrpowers-io / spark-style-guide
Spark style guide
☆263Updated last year
eakmanrq / sqlframe
Turning PySpark Into a Universal DataFrame API
☆443Updated last week
Qbeast-io / qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
☆233Updated 9 months ago
starburstdata / dbt-trino
The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
☆248Updated last month
getindata / dbt-flink-adapter
Adapter for dbt that executes dbt pipelines on Apache Flink
☆95Updated last year
projectnessie / iceberg-catalog-migrator
CLI tool to bulk migrate the tables from one catalog another without a data copy
☆81Updated 6 months ago
getindata / kafka-connect-iceberg-sink
☆80Updated 6 months ago
AbsaOSS / spark-hats
Nested array transformation helper extensions for Apache Spark
☆37Updated 2 years ago
duckdb / duckdb-iceberg
☆328Updated last week
holdenk / spark-flowchart
Flowchart for debugging Spark applications
☆107Updated last year
AbsaOSS / ABRiS
Avro SerDe for Apache Spark structured APIs.
☆235Updated 4 months ago
cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆93Updated 5 months ago
gabledata / recap
Work with your web service, database, and streaming schemas in a single format.
☆343Updated last month
getindata / dbt-airflow-factory
Library to convert DBT manifest metadata to Airflow tasks
☆49Updated last year
dbt-labs / dbt-spark
This repository has moved into https://github.com/dbt-labs/dbt-adapters
☆442Updated 3 months ago
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆60Updated 2 years ago
SETL-Framework / setl
A simple Spark-powered ETL framework that just works 🍺
☆182Updated 3 weeks ago
sodadata / soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Updated 3 years ago
SemyonSinchenko / flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Updated 4 months ago