h3ct0rjs / HighPerformanceComputingLinks
Class of High Performance Computing taken at U.T.P 2017
☆87Updated 8 years ago
Alternatives and similar repositories for HighPerformanceComputing
Users that are interested in HighPerformanceComputing are comparing it to the libraries listed below
Sorting:
- ☆200Updated last year
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆75Updated 4 years ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆158Updated last week
- CUDA Matrix Multiplication Optimization☆239Updated last year
- CUDA Learning guide☆477Updated last year
- NVIDIA tools guide☆147Updated 10 months ago
- ☆125Updated last month
- A set of hands-on tutorials for CUDA programming☆241Updated last year
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆531Updated 2 months ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆58Updated 8 months ago
- 🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PT…☆402Updated 3 months ago
- ☆176Updated 2 years ago
- Step-by-step optimization of CUDA SGEMM☆399Updated 3 years ago
- Examples from Programming in Parallel with CUDA☆165Updated 2 years ago
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …☆459Updated 2 years ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 6 months ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆74Updated 3 years ago
- CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.☆200Updated 5 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆195Updated 2 years ago
- Awesome resources for GPUs☆601Updated 2 years ago
- Some CUDA example code with READMEs.☆178Updated last week
- ☆83Updated last week
- 🌈 Solutions of LeetGPU☆52Updated last week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆127Updated 6 months ago
- Solution of Programming Massively Parallel Processors☆50Updated last year
- An experimental CPU backend for Triton☆160Updated last week
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆166Updated 10 months ago
- Fast CUDA matrix multiplication from scratch☆946Updated 2 months ago
- Machine Learning Compiler Road Map☆45Updated 2 years ago
- Cataloging released Triton kernels.☆267Updated 2 months ago