robert-altmiller / code_profilerLinks

☆16

Alternatives and similar repositories for code_profiler

Users that are interested in code_profiler are comparing it to the libraries listed below

Sorting:

elsevierlabs-os / NotebookDiscovery
Notebook Discovery Tool for Databricks notebooks
☆19Updated 3 years ago
Data-drone / DAIS2022-Scaling-Deep-Learning-Talk
☆10Updated 3 years ago
epec254 / rag_code
☆14Updated last year
StabRise / spark-pdf
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆78Updated 9 months ago
SemyonSinchenko / flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Updated 7 months ago
holdenk / spark-upgrade
Magic to help Spark pipelines upgrade
☆34Updated last year
zheyuan28 / SparkTaskMetrics
Task Metrics Explorer
☆14Updated 6 years ago
snowflakedb / spark-snowflake
Snowflake Data Source for Apache Spark.
☆230Updated 3 weeks ago
AbePabbathi / lakehouse-tacklebox
This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.
☆46Updated last year
cerndb / sparkMeasure
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16Updated 4 months ago
mrchristine / db-migration
Databricks Migration Tools
☆43Updated 4 years ago
fivetran / benchmark
Benchmark data warehouses under Fivetran-like conditions
☆171Updated 3 years ago
Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆197Updated this week
dbt-labs / dbt-spark
This repository has moved into https://github.com/dbt-labs/dbt-adapters
☆443Updated 6 months ago
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆61Updated 2 years ago
G-Research / spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
☆232Updated 2 weeks ago
HeartSaVioR / spark-state-tools
Spark Structured Streaming State Tools
☆34Updated 5 years ago
Nike-Inc / brickflow
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
☆226Updated this week
allisonwang-db / pyspark-data-sources
Custom PySpark Data Sources
☆85Updated last week
Snowflake-Labs / OpenLineage-AccessHistory-Setup
Guideline to extract table lineage info in OpenLineage format from access history view
☆14Updated 2 years ago
delta-io / delta-examples
Delta Lake examples
☆238Updated last year
MrPowers / mack
Delta Lake helper methods in PySpark
☆327Updated 2 weeks ago
mikulskibartosz / check-engine
Data validation library for PySpark 3.0.0
☆33Updated 3 years ago
Teradata / tpcds
Port of TPC-DS dsdgen to Java
☆50Updated last year
FRosner / drunken-data-quality
Spark package for checking data quality
☆222Updated 5 years ago
zalando-incubator / spark-json-schema
JSON schema parser for Apache Spark
☆82Updated 3 years ago
acroz / pylivy
A Python client for Apache Livy, enabling use of remote Apache Spark clusters.
☆70Updated 4 years ago
cloudera / dbt-spark-livy
The dbt-spark-livy adapter allows you to use dbt along with Apache Spark, by connecting via Apache Livy
☆12Updated 2 years ago
dbt-labs / dbt-presto
[ARCHIVED] The Presto adapter plugin for dbt Core
☆32Updated 2 years ago
ExpediaGroup / circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
☆91Updated last year