stanford-cs149 / asst4-trainiumLinks
☆27Updated 6 months ago
Alternatives and similar repositories for asst4-trainium
Users that are interested in asst4-trainium are comparing it to the libraries listed below
Sorting:
- ☆37Updated 3 weeks ago
- ☆219Updated this week
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆93Updated last week
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆14Updated 3 weeks ago
- extensible collectives library in triton☆86Updated 2 months ago
- ☆28Updated 5 months ago
- ☆20Updated last month
- Effective transpose on Hopper GPU☆23Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- ring-attention experiments☆144Updated 8 months ago
- ☆72Updated last year
- MLIR-based partitioning system☆97Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆90Updated 2 weeks ago
- Cataloging released Triton kernels.☆238Updated 5 months ago
- An experimental CPU backend for Triton☆126Updated 3 weeks ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆166Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 3 months ago
- ☆81Updated 7 months ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆59Updated last week
- ☆16Updated 9 months ago
- ☆23Updated 7 months ago
- ☆15Updated 2 years ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆50Updated last week
- A schedule language for large model training☆149Updated last year
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆61Updated 5 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆41Updated last year
- Fast low-bit matmul kernels in Triton☆322Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆137Updated 10 months ago
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆77Updated last week