sakjung / repartipyLinks
Helper for handling PySpark DataFrame partition size 📑🎛️
☆12Updated last year
Alternatives and similar repositories for repartipy
Users that are interested in repartipy are comparing it to the libraries listed below
Sorting:
- ☆11Updated 8 months ago
- Knowledge sharing - Cheat sheets☆11Updated 3 weeks ago
- duckdb-etl-framework☆12Updated 6 months ago
- Automate and streamline the alerting & notification process for dbt test results🐞🚀☆17Updated last month
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated last year
- ☆11Updated 5 months ago
- IceRunner is an Apache Arrow Flight Server Implementation for Apache Iceberg Tables☆9Updated 3 months ago
- Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified☆36Updated 2 months ago
- Receipes of publicly-available Jupyter images☆8Updated 4 months ago
- Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data type…☆55Updated this week
- Lecture notes, scripts, and material for the lecture of Selected Statistics Topics in the Autonomous University of Querétaro☆12Updated 8 months ago
- This repository serves as a comprehensive reference for both beginners and advanced users of Git. It provides an organized and easy-to-fo…☆11Updated 7 months ago
- This repository contains basic Data Structures code starting from Stack, Queue all the way up to Linked list, Singly linked list and Dou…☆12Updated last year
- clp-ffi-py is a Python library to encode log messages with CLP, and work with the encoded messages using a foreign function interface (FF…☆11Updated 4 months ago
- API for distributing Data Lake Data☆11Updated 3 months ago
- Notes that I should one day turn into a blog or something ...☆32Updated last month
- Boiling Insights - From raw S3 data to charts in seconds☆19Updated 7 months ago
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.☆15Updated 8 months ago
- ☆52Updated this week
- A python library for efficiently interacting and querying SQL databases☆28Updated 3 months ago
- ☆5Updated 3 months ago
- Big Data Newsletter☆23Updated last year
- API Framework heavily relying on the power of DuckDB and DuckDB extensions. Ready to build performant and cost-efficient APIs on top of B…☆37Updated 3 weeks ago
- Feature selection for tabular datasets using advanced filter and wrapper methods☆17Updated 4 months ago
- Building a poor man's data lake: Exploring the Power of Polars and Delta Lake☆10Updated last month
- Compare tables within or across databases☆10Updated last month
- A curated list of awesome SQLMesh resources☆36Updated 2 months ago
- Discover the simplicity and strength of Duckdb, dbt, and Iceberg in this project. Create an efficient, versatile data analytics solution …☆34Updated last year
- DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data qualit…☆59Updated 2 weeks ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago