sakjung / repartipy
Helper for handling PySpark DataFrame partition size 📑🎛️
☆12Updated 10 months ago
Alternatives and similar repositories for repartipy:
Users that are interested in repartipy are comparing it to the libraries listed below
- Lecture notes, scripts, and material for the lecture of Selected Statistics Topics in the Autonomous University of Querétaro☆13Updated 2 months ago
- ☆6Updated 3 weeks ago
- Local-first federated analytics query engine using DuckDB.☆17Updated last month
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated 11 months ago
- This repository serves as a comprehensive reference for both beginners and advanced users of Git. It provides an organized and easy-to-fo…☆12Updated last month
- GitGittu is a simple tool that allows users to view their GitHub repositories, gists, followers, and following. The tool is built with Ja…☆17Updated 8 months ago
- Auto Deploy Node.js REST API on AWS EC2 | CI/CD Pipeline using GitHub Actions☆12Updated 10 months ago
- dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.☆57Updated 2 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated 7 months ago
- ☆12Updated 2 months ago
- Analyze databases given as csv files with SQL schema and perform manipulation on them☆9Updated 3 months ago
- Apache Arrow Development Experiments☆14Updated last month
- Automate and streamline the alerting & notification process for dbt test results🐞🚀☆17Updated this week
- Filter faster, analyze smarter – because your DataFrames deserve it!☆18Updated 3 months ago
- Helm Charts for Prefect Server to expose the open source UI☆25Updated 4 years ago
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.☆44Updated last week
- This is a repository with a web client for the NEAR application, which is a convenient platform for notifying users about emergency situa…☆14Updated this week
- Receipes of publicly-available Jupyter images☆9Updated 3 months ago
- ☆18Updated 5 months ago
- Entity resolution for everyone. Minimal. No dependencies.☆11Updated 5 months ago
- A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON in…☆24Updated 2 months ago
- Kedro plugin that extends support for partitioned data processing in Kedro☆10Updated 5 months ago
- Dask integration for Snowflake☆30Updated 2 months ago
- ☆10Updated last month
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆30Updated this week
- An experimental Athena extension for DuckDB 🐤☆51Updated 2 weeks ago
- A DataOps framework for building a lakehouse.☆34Updated this week
- Data Catalogs Made Easy☆22Updated last month
- ☆11Updated last month