mrcat2018 / AutodiffEngine
AutodiffEngine
☆13Updated 6 years ago
Alternatives and similar repositories for AutodiffEngine:
Users that are interested in AutodiffEngine are comparing it to the libraries listed below
- InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.☆66Updated 3 years ago
- Place for meetup slides☆140Updated 4 years ago
- A Fast Muti-processing BERT-Inference System☆101Updated 2 years ago
- Triton Compiler related materials.☆28Updated 4 months ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- 200行写一个自动微分工具☆51Updated 5 years ago
- play gemm with tvm☆91Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- ☆127Updated 3 years ago
- An experimental ahead of time compiler for Relay.☆50Updated 5 years ago
- an automatic differentiation framework with dynamic graph/支持动态图的自动求导框架☆101Updated 5 years ago
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆125Updated 8 years ago
- The quantitative performance comparison among DL compilers on CNN models.☆74Updated 4 years ago
- ☆91Updated last month
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- notes on reading tensorflow source code☆13Updated 6 years ago
- ☆22Updated 5 years ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 3 months ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 2 years ago
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- (Spring 2018) Assignment 2: Graph Executor with TVM☆124Updated 7 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆91Updated 6 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆70Updated 6 years ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆132Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆36Updated 2 months ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆131Updated 3 years ago
- ☆193Updated 2 years ago
- TensorFlow and TVM integration☆37Updated 5 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆121Updated 2 years ago
- ☆148Updated 4 months ago