SzymonOzog / PennyView external linksLinks
Hand-Rolled GPU communications library
☆84Nov 25, 2025Updated 2 months ago
Alternatives and similar repositories for Penny
Users that are interested in Penny are comparing it to the libraries listed below
Sorting:
- Cute layout visualization☆30Jan 18, 2026Updated 3 weeks ago
- ☆14May 18, 2025Updated 8 months ago
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 8 months ago
- Faster Whisper ASR transcription with CTranslate2☆24Oct 25, 2024Updated last year
- Perplexity GPU Kernels☆560Nov 7, 2025Updated 3 months ago
- Benchmarking Goal-Oriented Software Engineering☆107Jan 7, 2026Updated last month
- Experiments with BitNet inference on CPU☆55Apr 1, 2024Updated last year
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆31Jun 13, 2025Updated 8 months ago
- Quantize transformers to any learned arbitrary 4-bit numeric format☆51Jan 25, 2026Updated 3 weeks ago
- A bunch of kernels that might make stuff slower 😉☆75Updated this week
- Quantized LLM training in pure CUDA/C++.☆238Jan 20, 2026Updated 3 weeks ago
- Physics Informed Neural Networks (PINNs) + SPINNs + HyperPINNs + Adaptative Loss Weights with JAX 📓 Check out our various notebooks to g…☆45Updated this week
- ☆34Sep 10, 2024Updated last year
- ☆27May 27, 2024Updated last year
- Notes and artifacts from the ONNX steering committee☆28Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated this week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated last month
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated 2 weeks ago
- This repository contains code, which was used to generate large-scale results in the HINTS paper.☆36Oct 17, 2024Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆353Dec 3, 2025Updated 2 months ago
- ☆38Mar 9, 2021Updated 4 years ago
- This repository contains the results and code for the MLPerf™ Training v2.0 benchmark.☆29Feb 23, 2024Updated last year
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆70Feb 6, 2026Updated last week
- All Resources from Stanford CS106B 2021☆23Jul 11, 2025Updated 7 months ago
- Memory Topology for GPUs☆17Dec 9, 2025Updated 2 months ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago
- The Forward-Forward Algorithm for Drug Discovery☆33Dec 30, 2022Updated 3 years ago
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆66Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 4 months ago
- [CVPR 2021] FMO Deblurring Benchmark☆13Jan 12, 2022Updated 4 years ago
- How to build an ACP compliant agent that uses MCP as well!☆11May 6, 2025Updated 9 months ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆13Jan 1, 2025Updated last year
- 2D time-domain isotropic (visco)elastic FD modeling and full waveform inversion (FWI) code for SH-waves☆13Aug 9, 2020Updated 5 years ago
- Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Rep…☆12Jul 26, 2023Updated 2 years ago
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year
- ☆11Feb 27, 2024Updated last year
- Scripts for creating mirror repositories that do not have .pre-commit-hooks.yaml☆48Jan 16, 2026Updated last month
- Code for paper "Beyond Closure Models: Learning Chaotic Systems via Physics-Informed Neural Operators".☆14Dec 24, 2025Updated last month
- Project focused on enhancing the quality of low-fidelity endoscopy images using Generative Adversarial Networks (GANs) implemented in PyT…☆17Jun 5, 2025Updated 8 months ago