kuterd / opal_ptxLinks

Experimental GPU language with meta-programming

☆23

Alternatives and similar repositories for opal_ptx

Users that are interested in opal_ptx are comparing it to the libraries listed below

Sorting:

salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 8 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated last month
HazyResearch / train-tk
train with kittens!
☆63Updated 11 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 6 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 5 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆192Updated 10 months ago
IST-DASLab / QuEST
Work in progress.
☆74Updated 3 months ago
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆126Updated last month
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 5 months ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆87Updated 3 weeks ago
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆99Updated 4 months ago
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆61Updated 7 months ago
hyhieu / easy_pybind
☆32Updated last year
google-deepmind / asyncdiloco
☆46Updated last year
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆91Updated 4 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆158Updated this week
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
lianakoleva / no-libtorch-compile
☆21Updated 7 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆101Updated 3 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated this week
kyleliang919 / Super_Muon
☆64Updated 6 months ago
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆87Updated last month
gau-nernst / kokoro
https://hf.co/hexgrad/Kokoro-82M
☆14Updated 7 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆299Updated this week
okarthikb / state-space-models
☆28Updated last year
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆198Updated this week
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆44Updated 9 months ago