Hand-Rolled GPU communications library
☆87Nov 25, 2025Updated 3 months ago
Alternatives and similar repositories for Penny
Users that are interested in Penny are comparing it to the libraries listed below
Sorting:
- Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025☆31Oct 22, 2025Updated 4 months ago
- Cute layout visualization☆31Jan 18, 2026Updated last month
- ☆14May 18, 2025Updated 9 months ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- ☆19Aug 10, 2024Updated last year
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 9 months ago
- Faster Whisper ASR transcription with CTranslate2☆24Oct 25, 2024Updated last year
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 4 months ago
- Benchmarking Goal-Oriented Software Engineering☆120Jan 7, 2026Updated 2 months ago
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆55Nov 12, 2025Updated 3 months ago
- Quantize transformers to any learned arbitrary 4-bit numeric format☆51Jan 25, 2026Updated last month
- A bunch of kernels that might make stuff slower 😉☆75Mar 2, 2026Updated last week
- Quantized LLM training in pure CUDA/C++.☆240Updated this week
- Physics Informed Neural Networks (PINNs) + SPINNs + HyperPINNs + Adaptative Loss Weights with JAX 📓 Check out our various notebooks to g…☆45Updated this week
- ☆27May 27, 2024Updated last year
- ☆34Sep 10, 2024Updated last year
- Notes and artifacts from the ONNX steering committee☆28Feb 26, 2026Updated last week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆196Updated this week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 2 months ago
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated last month
- Automatic R pipeline for FGSEA and classic enrichment (GO, KEGG, Reactome, Hallmark)☆18Oct 30, 2025Updated 4 months ago
- An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.☆109Feb 24, 2026Updated 2 weeks ago
- This repository contains code, which was used to generate large-scale results in the HINTS paper.☆36Oct 17, 2024Updated last year
- This repository contains the results and code for the MLPerf™ Training v2.0 benchmark.☆29Feb 23, 2024Updated 2 years ago
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 7 months ago
- OSI4IOT Platform☆13Mar 2, 2026Updated last week
- Memory Topology for GPUs☆18Updated this week
- ext_mpi_collectives☆11Apr 1, 2025Updated 11 months ago
- Bulk/Mass download your stuff via a batch of URLs easily.☆17Mar 17, 2025Updated 11 months ago
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆73Updated this week
- PyTorch Single Controller☆987Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- CS341 for Spring 2024☆11Jul 15, 2024Updated last year
- A Redis-compatible in-memory database server written in Rust with MLua-based Lua 5.1 scripting☆17Nov 28, 2025Updated 3 months ago
- ☆12Oct 7, 2024Updated last year
- Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Rep…☆12Jul 26, 2023Updated 2 years ago
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- Argonne Leadership Computing Facility OpenCL tutorial☆10Aug 22, 2025Updated 6 months ago
- UI Repository for Magistrala IoT☆11Aug 5, 2024Updated last year