kuterd / opal_ptx
Experimental GPU language with meta-programming
☆21Updated 6 months ago
Alternatives and similar repositories for opal_ptx:
Users that are interested in opal_ptx are comparing it to the libraries listed below
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆60Updated this week
- ☆21Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago
- research impl of Native Sparse Attention (2502.11089)☆54Updated last month
- ☆47Updated this week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated 11 months ago
- ☆49Updated last year
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- ☆13Updated 9 months ago
- ☆87Updated last year
- High-Performance SGEMM on CUDA devices☆86Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- ☆43Updated last year
- Work in progress.☆50Updated last week
- ☆32Updated 9 months ago
- Load compute kernels from the Hub☆99Updated this week
- DeMo: Decoupled Momentum Optimization☆185Updated 3 months ago
- Learn CUDA with PyTorch☆19Updated last month
- ☆27Updated 8 months ago
- 👷 Build compute kernels☆17Updated this week
- A package for defining deep learning models using categorical algebraic expressions.☆60Updated 7 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 11 months ago
- Collection of autoregressive model implementation☆83Updated last month
- working implimention of deepseek MLA☆38Updated 2 months ago
- Gpu benchmark☆55Updated last month
- Cerule - A Tiny Mighty Vision Model☆67Updated 6 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 5 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last week
- Make triton easier☆47Updated 9 months ago