SemyonSinchenko/flake8-pyspark-with-column

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SemyonSinchenko/flake8-pyspark-with-column)

SemyonSinchenko / flake8-pyspark-with-column

A flake8 plugin that detects of usage withColumn in a loop or inside reduce

☆28

Alternatives and similar repositories for flake8-pyspark-with-column

Users that are interested in flake8-pyspark-with-column are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Neutrinic / flare
View on GitHub
Full-stack OpenTelemetry observability for Apache Spark
☆16Feb 28, 2026Updated 5 months ago
benchsci / tinsel
View on GitHub
PySpark schema generator
☆44Feb 23, 2023Updated 3 years ago
SSripilaipong / lyrid
View on GitHub
☆29Jan 18, 2023Updated 3 years ago
bartosz25 / data-ai-summit-2024
View on GitHub
Visits sessionization pipeline used for the talk
☆13May 28, 2024Updated 2 years ago
databricks / databricks-sql-cli
View on GitHub
CLI for querying Databricks SQL
☆45Nov 24, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
094459 / blogpost-airflow-hybrid
View on GitHub
Repo that will help you explore how to build a hybrid workflow using Apache Airflow and Amazon ECS Anywhere
☆11Jul 12, 2022Updated 4 years ago
aws-samples / mwaa-disaster-recovery
View on GitHub
Disaster recovery solution for Amazon Managed Workflows for Apache Airflow (MWAA)
☆12Apr 27, 2026Updated 3 months ago
MrPowers / mack
View on GitHub
Delta Lake helper methods in PySpark
☆328Jan 19, 2026Updated 6 months ago
eGenix / egenix-advanced-match-parsing
View on GitHub
Advanced parsing of structured data using Python's new match statement
☆13Jan 15, 2025Updated last year
best-practice-and-impact / ons-spark
View on GitHub
☆21Mar 19, 2026Updated 4 months ago
StabRise / spark-pdf
View on GitHub
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆81Apr 27, 2025Updated last year
yaooqinn / itachi
View on GitHub
A library that brings useful functions from various modern database management systems to Apache Spark
☆63Sep 4, 2023Updated 2 years ago
databrickslabs / dqx
View on GitHub
Databricks framework to validate Data Quality of pySpark DataFrames and Tables
☆439Updated this week
danielbeach / sniffer
View on GitHub
csv and flat-file sniffer built in Rust.
☆45Jan 26, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
encero-systems / incan
View on GitHub
Incan: a modern, Pythonic language that compiles to Rust! Type-safe, async-friendly, with fixtures, testing, and web/inter-op built in.
☆29Updated this week
josephmachado / simple_polars_etl
View on GitHub
☆16Apr 26, 2024Updated 2 years ago
AmadeusITGroup / spark-perf-hikes
View on GitHub
Performance Hikes for Apache Spark
☆31May 20, 2026Updated 2 months ago
godatadriven / airflow-helm
View on GitHub
☆11Sep 23, 2019Updated 6 years ago
unitycatalog / unitycatalog-python
View on GitHub
☆19Jul 8, 2024Updated 2 years ago
gbrueckl / Fabric.Toolbox
View on GitHub
Tools for Microsoft Fabric
☆26Jun 26, 2026Updated last month
mrpowers-io / tsumugi-spark
View on GitHub
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
☆26Feb 22, 2025Updated last year
BauplanLabs / no-jvm-wap-with-iceberg
View on GitHub
A write-audit-publish implementation on a data lake without the JVM
☆45Aug 12, 2024Updated last year
pracdata / duckdb-pipeline
View on GitHub
Demonstrating the capabilities of DuckDB as a transformation engine for data lakes
☆34Oct 8, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Luka-J9 / sbt-buf
View on GitHub
An SBT Plugin that acts as a light wrapper around Buf.
☆10Oct 29, 2024Updated last year
bcodell / dbt-activity-schema
View on GitHub
A dbt package with a POC implementation of an interface to query activity streams that adhere to the Activity Schema 2.0 spec.
☆18May 28, 2026Updated 2 months ago
databricks-solutions / agent-monitoring-demo-app
View on GitHub
MLFlow 3.0 Agent Monitoring + Databricks Apps + FastAPI
☆16Jul 2, 2025Updated last year
sdw-online / code_examples_library
View on GitHub
The code examples from my online content
☆19Sep 29, 2024Updated last year
MHromiak / duckbridge
View on GitHub
Lightweight Python wrapper around the DuckDB extension, httpserver (extension developed by @quackscience)
☆17Sep 24, 2025Updated 10 months ago
BuckWoody / workshops
View on GitHub
Workshops created by Buck Woody, Data Scientist at Microsoft.
☆16May 14, 2024Updated 2 years ago
marhar / duckdb_tools
View on GitHub
Handy things for duckdb.
☆19Apr 29, 2026Updated 3 months ago
DataZooDE / anofox-tabular
View on GitHub
A duckdb extension which combines data quality and data preparation tools for tabular data.
☆16Updated this week
morganmcw / TM-LPG-X
View on GitHub
☆10Aug 23, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
sbinet / jdev-go-datascience-2017
View on GitHub
DataScience intro with Go for the JDEV-2017
☆10Nov 16, 2017Updated 8 years ago
SpatioCore / STAC-Atlas
View on GitHub
A centralized platform for managing, indexing, and providing STAC (SpatioTemporal Asset Catalog) Collection metadata from distributed cat…
☆22Jun 30, 2026Updated 3 weeks ago
Siddhu7007 / screen-time-api-agent-skill
View on GitHub
Agent skill for building production Screen Time (FamilyControls, ManagedSettings, ManagedSettingsUI, DeviceActivity) iOS features: blocki…
☆23Feb 14, 2026Updated 5 months ago
mitchelllisle / sparkdantic
View on GitHub
✨ A Pydantic to PySpark schema library
☆129Updated this week
databrickslabs / dbldatagen
View on GitHub
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used …
☆485Updated this week
adobe / lake-pulse
View on GitHub
A Rust library for analyzing data lake table health — checking the pulse — across multiple formats (Delta Lake, Apache Iceberg, Apache Hu…
☆20Jul 11, 2026Updated 2 weeks ago
Wuerike / kafka-iceberg-streaming
View on GitHub
Docker envinroment to stream data from Kafka to Iceberg tables
☆30Feb 27, 2024Updated 2 years ago