Quadrollopo / Alexnet_CudaLinks

A cuda implementation of Alexnet

☆8

Alternatives and similar repositories for Alexnet_Cuda

Users that are interested in Alexnet_Cuda are comparing it to the libraries listed below

Sorting:

jundaf2 / eigenMHA
Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.
☆30Updated 2 years ago
uysalere / cuda-matrix-vector-multiplication
Matrix-Vector Multiplication Using Shared and Coalesced Memory Access
☆16Updated 12 years ago
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆192Updated 6 months ago
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆452Updated 11 months ago
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆895Updated this week
nDIRECT / nDIRECT
A direct convolution library targeting ARM multi-core CPUs.
☆12Updated 8 months ago
buddy-compiler / buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
☆613Updated last week
qiaolian9 / Torch2Tensor
A easy tool for generating Tensor Program from Torch(besd on Torch FX & TVM Relax)
☆11Updated 2 years ago
tlc-pack / relax
☆196Updated 2 years ago
nicolaswilde / cuda-sgemm
☆67Updated 7 months ago
tpoisonooo / how-to-optimize-gemm
row-major matmul optimization
☆649Updated last year
ThanatosShinji / onnx-tool
A parser, editor and profiler tool for ONNX models.
☆450Updated last week
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆341Updated 3 years ago
Hardware-Alchemy / cuDNN-sample
cuDNN sample codes provided by Nvidia
☆46Updated 6 years ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆63Updated 11 months ago
microsoft / microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
☆270Updated last month
HuangShiqing / LearnAndTry
☆19Updated last month
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆370Updated 10 months ago
dianhsu / swin-transformer-cpp
Swin Transformer C++ Implementation
☆63Updated 4 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆214Updated last year
siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆794Updated last year
NVIDIA / TensorRT-Incubator
Experimental projects related to TensorRT
☆110Updated this week
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆371Updated 7 months ago
reed-lau / cute-gemm
☆129Updated 8 months ago
supernova-ge / Jipitauri
Georgian chat bot based on GPT-3
☆10Updated 8 months ago
sophgo / tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
☆769Updated this week
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆96Updated 7 years ago
Archermmt / tvm_walk_through
code reading for tvm
☆76Updated 3 years ago
QimingZheng / gemmlab
☆23Updated 3 years ago
RRZE-HPC / gpu-benches
collection of benchmarks to measure basic GPU capabilities
☆404Updated 6 months ago