MichalPitr / inference_engineLinks
Inference engine from scratch
☆18Updated 10 months ago
Alternatives and similar repositories for inference_engine
Users that are interested in inference_engine are comparing it to the libraries listed below
Sorting:
- Some CUDA example code with READMEs.☆176Updated 8 months ago
- CUDA Learning guide☆467Updated last year
- 100 days of CUDA Challenge☆47Updated 3 months ago
- ☆193Updated last year
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆166Updated 10 months ago
- Examples from Programming in Parallel with CUDA☆164Updated 2 years ago
- NVIDIA tools guide☆145Updated 10 months ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆803Updated this week
- Fast CUDA matrix multiplication from scratch☆928Updated 2 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆528Updated last month
- ☆77Updated last year
- Class of High Performance Computing taken at U.T.P 2017☆86Updated 8 years ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆116Updated last week
- Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sort…☆18Updated 2 years ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆192Updated 2 years ago
- Stanford CS149 -- Assignment 1☆129Updated 3 weeks ago
- ☆376Updated last month
- LLM training in simple, raw C/CUDA☆107Updated last year
- Learn GPU Programming in Mojo🔥 by Solving Puzzles☆195Updated last week
- Step-by-step optimization of CUDA SGEMM☆390Updated 3 years ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆248Updated last year
- ☆498Updated this week
- TransformerCPP is a minimal C++ machine learning library with autograd and tensor ops, inspired by PyTorch. It includes a from-scratch Tr…☆35Updated last month
- An experimental CPU backend for Triton☆157Updated last week
- Examples from the "C++ From Scratch" Series☆95Updated 2 years ago
- CUDA Matrix Multiplication Optimization☆235Updated last year
- Learning about CUDA by writing PTX code.☆146Updated last year
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆886Updated last year
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆217Updated 3 years ago