MichalPitr / inference_engineLinks

Inference engine from scratch

☆14

Alternatives and similar repositories for inference_engine

Users that are interested in inference_engine are comparing it to the libraries listed below

Sorting:

drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆99Updated 3 months ago
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆387Updated 11 months ago
Infatoshi / mnist-cuda
☆257Updated 4 months ago
tgautam03 / xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
☆117Updated 4 months ago
gpu-mode / reference-kernels
Reference Kernels for the Leaderboard
☆55Updated this week
essentialsofparallelcomputing / EssentialsOfParallelComputing
Main Book repository for the Parallel and High Performance Computing book, Manning Publications
☆208Updated 3 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆189Updated 10 months ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆133Updated last year
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆60Updated 7 years ago
siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆735Updated last year
RichardAns / CUDA-Programs
Examples from Programming in Parallel with CUDA
☆149Updated 2 years ago
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆133Updated 5 months ago
R100001 / Programming-Massively-Parallel-Processors
☆158Updated 10 months ago
stanford-cs149 / cs149gpt
☆71Updated last year
CoffeeBeforeArch / cpp_from_scratch
Examples from the "C++ From Scratch" Series
☆81Updated 2 years ago
ThoenigAdrian / NeuralNetworksCudaTutorial
Implement Neural Networks in Cuda from Scratch
☆23Updated last year
CisMine / GPU-in-ML-DL
Apply GPU in ML and DL
☆52Updated 3 months ago
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆333Updated 3 years ago
iiithf / concurrent-data-structures
A Concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads (or processes) on…
☆34Updated last month
SzymonOzog / FastSoftmax
☆35Updated 5 months ago
BobMcDear / neural-network-cuda
Neural network from scratch in CUDA/C++
☆80Updated 4 months ago
stanford-cs149 / asst1
Stanford CS149 -- Assignment 1
☆107Updated 8 months ago
loganwatchorn / notes-pmpp
Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)
☆53Updated 9 months ago
andreinechaev / nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
☆233Updated 8 months ago
dawn-chu / EECS-368-Programming-Massively-Parallel-Processors-with-CUDA
☆20Updated 9 years ago
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆54Updated 5 months ago
astledsa / Deep-Learning-C
An implement of deep learning framework and models in C
☆48Updated 2 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆229Updated 4 months ago
quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …
☆61Updated 2 weeks ago
rbga / CUDA-Merge-and-Bitonic-Sort
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sort…
☆15Updated last year