dianhsu / swin-transformer-cppLinks

Swin Transformer C++ Implementation

☆62

Alternatives and similar repositories for swin-transformer-cpp

Users that are interested in swin-transformer-cpp are comparing it to the libraries listed below

Sorting:

dianhsu / transformer-cpp-cpu
用C++实现一个简单的Transformer模型。 Attention Is All You Need。
☆48Updated 4 years ago
leimao / PyTorch-Quantization-Aware-Training
PyTorch Quantization Aware Training Example
☆137Updated last year
MegEngine / cutlass
CUDA Templates for Linear Algebra Subroutines
☆100Updated last year
li199603 / sgemm_with_cuda
SGEMM optimization with cuda step by step
☆20Updated last year
UDC-GAC / openCNN
A Winograd Minimal Filter Implementation in CUDA
☆25Updated 3 years ago
Qualcomm-AI-research / FP8-quantization
☆153Updated 2 years ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆63Updated 10 months ago
xxxxyu / FlexNN
Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"
☆55Updated 5 months ago
Syencil / Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
☆60Updated 4 years ago
MegEngine / examples
A set of examples around MegEngine
☆31Updated last year
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆95Updated 6 years ago
hova88 / CUDA-MatMul-Practice
☆17Updated last year
JieRen98 / SGEMM-SASS-Annotation
☆21Updated 4 years ago
mrzhuzhe / riven
CPU Memory Compiler and Parallel programing
☆26Updated 7 months ago
zeroine / cutlass-cute-sample
☆37Updated last year
njuhope / cuda_sgemm
☆113Updated last year
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆70Updated 6 years ago
BBuf / how-to-optimize-gemm
☆97Updated 3 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆202Updated 11 months ago
OpenPPL / ppl.kernel.cuda
☆37Updated 9 months ago
paramhanji / CUDA-CNN
Implementation of a simple CNN using CUDA
☆68Updated 8 years ago
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆110Updated 7 months ago
qiaolian9 / Torch2Tensor
A easy tool for generating Tensor Program from Torch(besd on Torch FX & TVM Relax)
☆11Updated 2 years ago
tigert1998 / qat
Manually implemented quantization-aware training
☆21Updated 2 years ago
OpenPPL / ppl.pmx
☆59Updated 7 months ago
jundaf2 / CUDA-INT8-GEMM
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆31Updated last year
pigirons / conv3x3_m1
This is a demo how to write a high performance convolution run on apple silicon
☆54Updated 3 years ago
fastmachinelearning / qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
☆149Updated 2 weeks ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated last year
OpenPPL / ppl.kernel.cpu
☆17Updated last year