MegEngine / cutlass
CUDA Templates for Linear Algebra Subroutines
☆90Updated 4 months ago
Related projects: ⓘ
- MegEngine到其他框架的转换器☆67Updated last year
- A set of examples around MegEngine☆29Updated 9 months ago
- ☆100Updated 5 months ago
- ☆92Updated 3 years ago
- Swin Transformer C++ Implementation☆53Updated 3 years ago
- ☆44Updated 3 years ago
- NART = NART is not A RunTime, a deep learning inference framework.☆38Updated last year
- ☆113Updated last year
- Offline Quantization Tools for Deploy.☆109Updated 8 months ago
- ☆212Updated last year
- TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化☆135Updated 2 years ago
- ☆77Updated last year
- FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch☆32Updated 2 years ago
- Common libraries for PPL projects☆28Updated last week
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆191Updated 2 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆82Updated 6 months ago
- A simple high performance CUDA GEMM implementation.☆319Updated 8 months ago
- Post-Training Quantization for Vision transformers.☆176Updated 2 years ago
- Slides with modifications for a course at Tsinghua University.☆57Updated 2 years ago
- ☆32Updated 3 months ago
- play gemm with tvm☆81Updated last year
- Fast CUDA Kernels for ResNet Inference.☆164Updated 5 years ago
- Manually implemented quantization-aware training☆21Updated last year
- Inference of quantization aware trained networks using TensorRT☆77Updated last year
- ☆90Updated 6 months ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 2 years ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆26Updated 2 years ago
- CUDA 6大并行计算模式 代码与笔记☆57Updated 4 years ago
- symmetric int8 gemm☆66Updated 4 years ago
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆96Updated 2 years ago