sakjung / repartipy
Helper for handling PySpark DataFrame partition size πποΈ
β12Updated last year
Alternatives and similar repositories for repartipy:
Users that are interested in repartipy are comparing it to the libraries listed below
- Dask integration for Snowflakeβ30Updated 4 months ago
- An experimental Athena extension for DuckDB π€β53Updated 2 months ago
- Delta reader for the Ray open-source toolkit for building ML applicationsβ45Updated last year
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.β49Updated last month
- β9Updated last month
- β11Updated 4 months ago
- β47Updated 2 weeks ago
- This repo contains information about DuckDB extensions found on GitHub. Refreshed dailyβ95Updated this week
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/β11Updated 10 months ago
- β31Updated last year
- A pyspark lib to validate data qualityβ18Updated 2 years ago
- β οΈ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.β41Updated 2 months ago
- DuckDB API integrationsβ29Updated last month
- Boiling Insights - From raw S3 data to charts in secondsβ17Updated 3 months ago
- duckdb-etl-frameworkβ10Updated 3 months ago
- An infrastructure as code approach to deploying Snowflake using Terraformβ25Updated last year
- Sample code to accompany blog post showcasing Arrow Flight SQL running on DuckDBβ32Updated 2 years ago
- A serverless duckDB deployment at GCPβ38Updated 2 years ago
- Skeleton project for Apache Airflow training participants to work on.β16Updated 4 years ago
- A conda-smithy repository for python-duckdb.β13Updated last week
- API for distributing Data Lake Dataβ10Updated this week
- Write your dbt models using Ibisβ64Updated last week
- A high-performance data streaming system using DuckDB and Apache Arrow Flight.β73Updated last month
- (Experimental) C/C++ template for DuckDB extensions based on the C APIβ13Updated last month
- Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those pluβ¦β60Updated last year
- Automate and streamline the alerting & notification process for dbt test resultsππβ17Updated last month
- β37Updated this week
- β15Updated 4 months ago
- β89Updated 10 months ago
- Plugin for Intake to read from SQL serversβ15Updated last year