sakjung / repartipyView external linksLinks
Helper for handling PySpark DataFrame partition size 📑🎛️
☆12Mar 8, 2024Updated last year
Alternatives and similar repositories for repartipy
Users that are interested in repartipy are comparing it to the libraries listed below
Sorting:
- riichi mahjong helper service (마작 탐구생활)☆16May 9, 2023Updated 2 years ago
- A suite of PySpark, Pandas, and general pipeline utils for ONS projects.☆18Oct 15, 2025Updated 4 months ago
- Python Package to Share/Edit Pandas/Polars DF with web interface!☆11Jun 10, 2025Updated 8 months ago
- How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.☆14Jun 8, 2020Updated 5 years ago
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- ☆11Nov 26, 2024Updated last year
- This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, …☆16Feb 5, 2026Updated last week
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year
- Go wrapper around SSH that speaks AWS API☆16Aug 15, 2023Updated 2 years ago
- Pager for tabular data and SQL output☆12Mar 29, 2023Updated 2 years ago
- Associated blog post - https://tristanrhodes.com/blog/Adventures-in-Algorithmic-Trading-on-the-Runescape-Grand-Exchange☆10Oct 14, 2024Updated last year
- similarity between graph nodes based on local information with PySpark☆10Sep 30, 2022Updated 3 years ago
- A Configuration System for Airflow☆14Updated this week
- Deploy an AWS ECS Cluster of EC2 Instances with Terraform☆13Dec 26, 2023Updated 2 years ago
- An intelligent predictive text entry platform. Mirror of git://git.code.sf.net/p/presage/presage Please send reports to the SourceForge b…☆11Aug 17, 2015Updated 10 years ago
- Apache Arrow Flight example☆11Nov 9, 2020Updated 5 years ago
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- A tool for comparing large S3 buckets☆17Jan 22, 2026Updated 3 weeks ago
- adidas Data Mesh implementation☆12May 13, 2022Updated 3 years ago
- How to run DBT on AWS Fargate☆13Oct 15, 2019Updated 6 years ago
- Jigsawstack Python SDK☆18Updated this week
- ☆12Apr 11, 2015Updated 10 years ago
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆13Updated this week
- A Python template optimizing for best practices that remain adaptable over time☆22Jan 23, 2026Updated 3 weeks ago
- Ssebowa is free and open source library in Python that provides generative-ai models.☆14Jan 31, 2024Updated 2 years ago
- Type-annotate your spark dataframes and validate them☆14Feb 5, 2026Updated last week
- [unix sample manager] search TERABYTES of samples across different (network) disks in few seconds (+offline)☆16Feb 13, 2023Updated 3 years ago
- python-hll☆18Dec 26, 2022Updated 3 years ago
- ☆13Dec 8, 2022Updated 3 years ago
- Generate a research database from the IRS 990 E-Filer Returns on AWS.☆15Oct 24, 2022Updated 3 years ago
- MOTU midi express 128 linux driver☆18Jan 13, 2026Updated last month
- Modern Python Type Hinting Guide (current as of Python 3.14). Shared publicly for review and comment. Not licensed for redistribution or …☆32Jan 22, 2026Updated 3 weeks ago
- A library to mutate parquet files☆19May 9, 2023Updated 2 years ago
- Unity + TensorRT integration☆15Nov 27, 2018Updated 7 years ago
- An open source alternative to the AWS Console!☆15Aug 18, 2021Updated 4 years ago
- libcaca library to emscripten☆24Mar 10, 2014Updated 11 years ago
- Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to f…☆19Feb 26, 2018Updated 7 years ago
- This repository builds a production-ready Docker image to productionalize an MLFlow cluster☆12Jan 19, 2021Updated 5 years ago