MrPowers / farsanteLinks

Fake Pandas / PySpark DataFrame creator

☆48

Alternatives and similar repositories for farsante

Users that are interested in farsante are comparing it to the libraries listed below

Sorting:

mrpowers-io / jodie
Delta lake and filesystem helper methods
☆51Updated last year
MrPowers / beavis
Pandas helper functions
☆31Updated 2 years ago
fugue-project / tutorials
Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…
☆114Updated last year
jeppe742 / DeltaLakeReader
Read Delta tables without any Spark
☆47Updated last year
jonathanneo / data-aware-orchestration
Data-aware orchestration with dagster, dbt, and airbyte
☆30Updated 2 years ago
jwills / de4ml
Supporting materials/code examples for my course in data engineering for machine learning.
☆38Updated 2 years ago
dask-contrib / dask-snowflake
Dask integration for Snowflake
☆30Updated 2 months ago
Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆190Updated this week
canimus / cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
☆223Updated this week
danielbeach / tinytimmy
A simple and easy to use Data Quality (DQ) tool built with Python.
☆50Updated 2 years ago
dbt-labs / spark-utils
Utility functions for dbt projects running on Spark
☆33Updated 8 months ago
Nike-Inc / brickflow
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
☆218Updated 2 weeks ago
great-expectations / great_expectations_action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
☆81Updated last year
mitchelllisle / sparkdantic
✨ A Pydantic to PySpark schema library
☆108Updated last week
danielbeach / sniffer
csv and flat-file sniffer built in Rust.
☆43Updated last year
rafaelpierre / pyjaws
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
☆43Updated this week
delta-io / delta-docs
Delta Lake Documentation
☆50Updated last year
infinitelambda / dq-tools
Make simple storing test results and visualisation of these in a BI dashboard
☆47Updated last month
sodadata / soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Updated 3 years ago
delta-io / delta-examples
Delta Lake examples
☆230Updated last year
josephmachado / cost_effective_data_pipelines
Cost Efficient Data Pipelines with DuckDB
☆57Updated 5 months ago
BauplanLabs / no-jvm-wap-with-iceberg
A write-audit-publish implementation on a data lake without the JVM
☆46Updated last year
astronomer / airflow-provider-great-expectations
Great Expectations Airflow operator
☆167Updated last week
binste / dbt-ibis
Write your dbt models using Ibis
☆71Updated 7 months ago
sodadata / soda-sql
Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
☆62Updated 2 years ago
ssp-data / data-engineering-devops
Full stack data engineering tools and infrastructure set-up
☆56Updated 4 years ago
danielbeach / lakescum
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
☆26Updated last year
Spratiher9 / JumpSpark
JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.
☆10Updated 2 years ago
astronomer / airflow-testing-guide
☆23Updated 4 years ago
mehd-io / duckdb-dataviz-demo
DuckDB with Dashboarding tools demo evidence, streamlit and rill
☆21Updated last year