vaibhawvipul / performance-engineeringLinks
☆30Updated 3 years ago
Alternatives and similar repositories for performance-engineering
Users that are interested in performance-engineering are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆112Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆71Updated this week
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 6 months ago
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆349Updated 3 weeks ago
- Some CUDA example code with READMEs.☆179Updated 2 months ago
- ☆95Updated this week
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆182Updated last year
- Learning about CUDA by writing PTX code.☆152Updated last year
- Custom PTX Instruction Benchmark☆138Updated 11 months ago
- ☆15Updated 3 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 3 months ago
- Benchmarks to capture important workloads.☆32Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Updated 5 months ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- ☆28Updated last year
- 100 days of CUDA Challenge☆47Updated 6 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆25Updated 3 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated this week
- Custom kernels in Triton language for accelerating LLMs☆27Updated last year
- Hand-Rolled GPU communications library☆82Updated 2 months ago
- ☆19Updated 9 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆93Updated 2 years ago
- ☆12Updated 5 months ago
- Ship correct and fast LLM kernels to PyTorch☆140Updated 3 weeks ago
- benchmarking some transformer deployments☆26Updated last month
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆201Updated this week
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆103Updated 4 months ago
- CS294 AI Systems Class Website☆17Updated 3 years ago