sakjung / repartipy
Helper for handling PySpark DataFrame partition size πποΈ
β12Updated last year
Alternatives and similar repositories for repartipy:
Users that are interested in repartipy are comparing it to the libraries listed below
- β9Updated 2 months ago
- Delta reader for the Ray open-source toolkit for building ML applicationsβ45Updated last year
- duckdb-etl-frameworkβ10Updated 4 months ago
- Automate and streamline the alerting & notification process for dbt test resultsππβ17Updated 2 months ago
- Skeleton project for Apache Airflow training participants to work on.β16Updated 4 years ago
- CLI for data platformβ19Updated last year
- β18Updated 9 months ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/β11Updated 11 months ago
- Compare tables within or across databasesβ9Updated last month
- API for distributing Data Lake Dataβ11Updated last month
- The sane way of building a data layer in Airflowβ24Updated 5 years ago
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.β12Updated 5 months ago
- β11Updated 5 months ago
- An experimental Athena extension for DuckDB π€β54Updated 3 months ago
- DataHub on AWS demonstration resourcesβ10Updated 2 years ago
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ16Updated 2 weeks ago
- β11Updated 5 months ago
- A Python Client for Hive Metastoreβ12Updated last year
- β46Updated last week
- Entity resolution for everyone. Minimal. No dependencies.β10Updated 2 weeks ago
- A write-audit-publish implementation on a data lake without the JVMβ46Updated 8 months ago
- Unity Catalog UIβ40Updated 7 months ago
- β10Updated 2 years ago
- This repo contains information about DuckDB extensions found on GitHub. Refreshed dailyβ96Updated this week
- dagster scikit-learn pipeline example.β44Updated 2 years ago
- Utility functions for dbt projects running on Sparkβ32Updated 2 months ago
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.β52Updated 2 months ago
- Boiling Insights - From raw S3 data to charts in secondsβ18Updated 4 months ago
- Receipes of publicly-available Jupyter imagesβ8Updated last month
- Discover the simplicity and strength of Duckdb, dbt, and Iceberg in this project. Create an efficient, versatile data analytics solution β¦β34Updated last year