intel / tiny-dpcpp-nn
SYCL implementation of Fused MLPs for Intel GPUs
☆43Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for tiny-dpcpp-nn
- ☆32Updated 5 months ago
- ☆29Updated this week
- ☆145Updated this week
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆79Updated 5 months ago
- Attention in SRAM on Tenstorrent Grayskull☆29Updated 3 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆184Updated last month
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- Learning about CUDA by writing PTX code.☆28Updated 8 months ago
- extensible collectives library in triton☆65Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆65Updated 10 months ago
- ☆14Updated last month
- Simple and fast low-bit matmul kernels in CUDA / Triton☆140Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆53Updated this week
- ☆59Updated this week
- ☆18Updated last month
- ☆42Updated 11 months ago
- ☆39Updated last month
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆85Updated 3 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆145Updated last month
- rocWMMA☆91Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆61Updated this week
- ☆48Updated 8 months ago
- [WIP] Context parallel attention that works with torch.compile☆20Updated last week
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- Development repository for the Triton language and compiler☆93Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆211Updated 3 months ago