mrcat2018 / AutodiffEngine
AutodiffEngine
☆13Updated 5 years ago
Alternatives and similar repositories for AutodiffEngine:
Users that are interested in AutodiffEngine are comparing it to the libraries listed below
- InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.☆66Updated 3 years ago
- Place for meetup slides☆140Updated 4 years ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- A Fast Muti-processing BERT-Inference System☆101Updated 2 years ago
- 200行写一个自动微分工具☆50Updated 5 years ago
- play gemm with tvm☆89Updated last year
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆125Updated 7 years ago
- pytorch源码阅读 0.2.0 版本☆90Updated 5 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆68Updated 5 years ago
- ☆23Updated last year
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- A home for the final text of all TVM RFCs.☆103Updated 5 months ago
- ☆22Updated 5 years ago
- ☆70Updated 2 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆119Updated 2 years ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- ☆194Updated last year
- ☆95Updated 3 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated 3 weeks ago
- TensorFlow and TVM integration☆37Updated 4 years ago
- 动手学习TVM核心原理教程☆60Updated 4 years ago
- Triton Compiler related materials.☆28Updated 2 months ago
- notes on reading tensorflow source code☆13Updated 6 years ago
- ☆145Updated 2 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆106Updated 6 months ago
- The quantitative performance comparison among DL compilers on CNN models.☆74Updated 4 years ago
- A simple deep learning framework that supports automatic differentiation and GPU acceleration.☆58Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆89Updated 3 weeks ago
- ☆82Updated last year
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago