h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆56Updated 7 years ago
Alternatives and similar repositories for HighPerformanceComputing
Users that are interested in HighPerformanceComputing are comparing it to the libraries listed below
Sorting:
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆66Updated 4 years ago
- CUDA Matrix Multiplication Optimization☆186Updated 9 months ago
- Reference Kernels for the Leaderboard☆45Updated this week
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆33Updated 2 months ago
- Examples from Programming in Parallel with CUDA☆141Updated 2 years ago
- 🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PT…☆262Updated 2 weeks ago
- ☆168Updated last year
- ☆153Updated 9 months ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆61Updated 8 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 3 months ago
- Step-by-step optimization of CUDA SGEMM☆317Updated 3 years ago
- CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.☆182Updated last month
- 分层解耦的深度学习推理引擎☆73Updated 2 months ago
- Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sort…☆15Updated last year
- A simple high performance CUDA GEMM implementation.☆366Updated last year
- Implement Neural Networks in Cuda from Scratch☆23Updated 11 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆357Updated 4 months ago
- Machine Learning Compiler Road Map☆44Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆130Updated last year
- IMPACT GPU Algorithms Teaching Labs☆57Updated 2 years ago
- Cataloging released Triton kernels.☆221Updated 4 months ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆55Updated 6 months ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆66Updated 2 years ago
- Solution of Programming Massively Parallel Processors☆44Updated last year
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆173Updated last week
- Fastest kernels written from scratch☆261Updated last month
- Free resource for the book AI Compiler Development Guide☆43Updated 2 years ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆30Updated last year
- ☆102Updated last month
- NVIDIA tools guide☆132Updated 4 months ago