stanford-cs149 / asst4-trainium
☆27Updated 5 months ago
Alternatives and similar repositories for asst4-trainium:
Users that are interested in asst4-trainium are comparing it to the libraries listed below
- ☆202Updated 2 weeks ago
- ☆79Updated 6 months ago
- extensible collectives library in triton☆86Updated last month
- ☆34Updated last month
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 9 months ago
- An experimental CPU backend for Triton☆110Updated 2 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆310Updated 2 weeks ago
- Cataloging released Triton kernels.☆220Updated 4 months ago
- ☆68Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆131Updated 9 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆102Updated this week
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM