tgautam03 / tGeMM
General Matrix Multiplication using NVIDIA Tensor Cores
☆11Updated 2 months ago
Alternatives and similar repositories for tGeMM:
Users that are interested in tGeMM are comparing it to the libraries listed below
- High-Performance SGEMM on CUDA devices☆87Updated 2 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆60Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- Custom PTX Instruction Benchmark☆119Updated last month
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Learning about CUDA by writing PTX code.☆124Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- ☆13Updated 2 weeks ago
- Experimental GPU language with meta-programming☆21Updated 6 months ago
- Learn CUDA with PyTorch☆19Updated last month
- JAX implementation of the Mistral 7b v0.2 model☆35Updated 8 months ago
- A parallel framework for training deep neural networks☆57Updated last week
- This is a port of Mistral-7B model in JAX☆32Updated 8 months ago
- train with kittens!☆54Updated 5 months ago
- Attention in SRAM on Tenstorrent Grayskull☆32Updated 8 months ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 9 months ago
- ☆87Updated last year
- Collection of kernels written in Triton language☆114Updated last month
- Reference Kernels for the Leaderboard☆23Updated 3 weeks ago
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- ML/DL Math and Method notes☆59Updated last year
- ☆87Updated 2 weeks ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 5 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 8 months ago
- ☆27Updated 2 months ago
- ☆28Updated 2 months ago
- GPU documentation for humans☆31Updated 3 weeks ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last week
- extensible collectives library in triton☆84Updated 6 months ago