Hand-Rolled GPU communications library
☆93Nov 25, 2025Updated 6 months ago
Alternatives and similar repositories for Penny
Users that are interested in Penny are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 11 months ago
- An implementation of libc, attempting to be compliant with C89, C99 and C11 standards☆14May 11, 2026Updated 2 weeks ago
- ☆27May 27, 2024Updated 2 years ago
- Benchmark of different C or C++ loggers☆12Sep 13, 2023Updated 2 years ago
- Vector Search Benchmarking suite☆16May 4, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CUPTI based GPU profiling library exposing usdt hooks☆31May 20, 2026Updated last week
- Cute layout visualization☆39Jan 18, 2026Updated 4 months ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- Optimized primitives for collective multi-GPU communication☆10May 8, 2024Updated 2 years ago
- Quantize transformers to any learned arbitrary 4-bit numeric format☆57Apr 13, 2026Updated last month
- Quantized LLM training in pure CUDA/C++.☆246Mar 6, 2026Updated 2 months ago
- Perplexity GPU Kernels☆576Nov 7, 2025Updated 6 months ago
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆137Nov 26, 2025Updated 6 months ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆34Jun 13, 2025Updated 11 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- CS341 for Spring 2024☆11Jul 15, 2024Updated last year
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆16May 21, 2026Updated last week
- ☆30Oct 24, 2025Updated 7 months ago
- Marketplace ML experiment - training without backprop☆27Sep 9, 2025Updated 8 months ago
- A lightweight computational physics framework, based on the organization of turboWAVE. Implements a "Simulation, PhysicsModule, ComputeTo…☆12Apr 1, 2026Updated last month
- GPU-accelerated LLM Training Simulator☆19Jun 26, 2025Updated 11 months ago
- ☆14Aug 9, 2023Updated 2 years ago
- Benchmarking Goal-Oriented Software Engineering☆158May 5, 2026Updated 3 weeks ago
- a benchmark to evaluate the situated inductive reasoning☆15Jan 7, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Personal solutions to the Triton Puzzles☆21Jul 18, 2024Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆359May 6, 2026Updated 3 weeks ago
- ☆93Jul 5, 2024Updated last year
- ☆14Apr 16, 2025Updated last year
- We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effe…☆25Feb 10, 2024Updated 2 years ago
- minimal C implementation of speculative decoding based on llama2.c☆30Jul 15, 2024Updated last year
- FrontierSWE is an ultra long-horizon coding agent benchmark that tests implementation, performance eng and ML research☆121Apr 30, 2026Updated 3 weeks ago
- Incubating P/D sidecar for llm-d☆17Nov 13, 2025Updated 6 months ago
- BFloat16 Fused Adam Operator for PyTorch☆19Nov 16, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆24Updated this week
- 🧠 Workshop Notebook and assets for the Anthropic Hackathon☆12Nov 4, 2023Updated 2 years ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆1,030Mar 24, 2026Updated 2 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆100Sep 19, 2025Updated 8 months ago
- Explore training for quantized models☆26Jul 12, 2025Updated 10 months ago
- ☆44Aug 21, 2025Updated 9 months ago
- Persistent Kernel + JIT-Injected Operators (CUDA)☆47Jan 27, 2026Updated 4 months ago