Mengjintao / FastCNN
☆18Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for FastCNN
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 2 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆85Updated 8 months ago
- ☆38Updated 4 years ago
- ☆93Updated 3 years ago
- Sandbox for TVM and playing around!☆22Updated last year
- Benchmark scripts for TVM☆73Updated 2 years ago
- play gemm with tvm☆84Updated last year
- symmetric int8 gemm☆66Updated 4 years ago
- An external memory allocator example for PyTorch.☆13Updated 3 years ago
- OneFlow->ONNX☆42Updated last year
- ☆16Updated this week
- study of cutlass☆19Updated last week
- ☆22Updated 7 months ago
- ☆17Updated 7 months ago
- Optimize GEMM with tensorcore step by step☆15Updated 11 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 3 months ago
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- Tencent Distribution of TVM☆15Updated last year
- ☆24Updated last year
- ☆10Updated 4 years ago
- ☆18Updated last month
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆26Updated 2 years ago
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆11Updated last month
- ☆67Updated last year
- ☆14Updated 2 years ago
- quantize aware training package for NCNN on pytorch☆68Updated 3 years ago