sashgorokhov / pyspark-spyLinks
Collect and aggregate on spark events for profitz
☆11Updated 3 years ago
Alternatives and similar repositories for pyspark-spy
Users that are interested in pyspark-spy are comparing it to the libraries listed below
Sorting:
- Helpers & syntactic sugar for PySpark.☆62Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Dask integration for Snowflake☆30Updated 8 months ago
- Using the Parquet file format with Python☆15Updated last year
- Read Delta tables without any Spark☆47Updated last year
- Airflow plugin to transfer arbitrary files between operators☆78Updated 6 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Build your feature store with macros right within your dbt repository☆39Updated 2 years ago
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 4 years ago
- PySpark schema generator☆43Updated 2 years ago
- CLI for data platform☆19Updated last year
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Parse dbt artifacts and search dbt models with Algolia☆52Updated 4 years ago
- DBT Cloud Plugin for Airflow☆38Updated last year
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Pandas helper functions☆31Updated 2 years ago
- Projects developed by Domino's R&D team☆78Updated 3 years ago
- Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️☆17Updated 2 months ago
- A library to mutate parquet files☆19Updated 2 years ago
- Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.☆79Updated last month
- This repository is no longer maintained.☆15Updated 3 years ago
- Collection of utility scripts to extract code so it can be upgraded to SnowFlake using the SnowConvert tool.☆20Updated last week
- scaffold of Apache Airflow executing Docker containers☆85Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Official dbt adapter for Vertica☆25Updated last month
- A Python client for Apache Livy, enabling use of remote Apache Spark clusters.☆70Updated 3 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆110Updated this week