kuterd / opal_ptxLinks
Experimental GPU language with meta-programming
☆23Updated last year
Alternatives and similar repositories for opal_ptx
Users that are interested in opal_ptx are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆107Updated 8 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 10 months ago
- H-Net Dynamic Hierarchical Architecture☆80Updated last month
- train with kittens!☆63Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 6 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago
- DeMo: Decoupled Momentum Optimization☆192Updated 10 months ago
- Work in progress.☆74Updated 3 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆126Updated last month
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Collection of autoregressive model implementation☆86Updated 5 months ago
- Learn CUDA with PyTorch☆87Updated 3 weeks ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆99Updated 4 months ago
- research impl of Native Sparse Attention (2502.11089)☆61Updated 7 months ago
- ☆32Updated last year
- ☆46Updated last year
- Samples of good AI generated CUDA kernels☆91Updated 4 months ago
- 👷 Build compute kernels☆158Updated this week
- supporting pytorch FSDP for optimizers☆83Updated 10 months ago
- ☆21Updated 7 months ago
- The evaluation framework for training-free sparse attention in LLMs☆101Updated 3 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆58Updated this week
- ☆64Updated 6 months ago
- SIMD quantization kernels☆87Updated last month
- https://hf.co/hexgrad/Kokoro-82M☆14Updated 7 months ago
- ☆53Updated last year
- Load compute kernels from the Hub☆299Updated this week
- ☆28Updated last year
- Quantized LLM training in pure CUDA/C++.☆198Updated this week
- working implimention of deepseek MLA☆44Updated 9 months ago