songqun / speedup-aarch64-cpu
a computing kernel implementation in ML inference framework aiming at theoretical limit
☆11Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for speedup-aarch64-cpu
- A proof of concept of Intel VNNI instruction module.☆10Updated 4 years ago
- CNNs in Halide☆23Updated 9 years ago
- Symbolic Expression and Statement Module for new DSLs☆206Updated 4 years ago
- Visualize TVM Relay program graph☆12Updated 5 years ago
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- put my presentation materials.☆123Updated 7 years ago
- An experimental ahead of time compiler for Relay.☆51Updated 4 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆102Updated last year
- TensorFlow and TVM integration☆38Updated 4 years ago
- modified cutlass☆14Updated 4 years ago
- ☆20Updated 2 years ago
- Main project, Compiler design PKU 2019 Spring☆8Updated 5 years ago
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆117Updated 2 years ago
- ☆32Updated 2 years ago
- Repository for SysML19 Artifacts Evaluation☆53Updated 5 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆18Updated 8 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆78Updated last year
- ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)☆17Updated 5 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆70Updated 9 years ago
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆54Updated last year
- Benchmark scripts for TVM☆73Updated 2 years ago
- Static analysis framework for analyzing programs written in TVM's Relay IR.☆27Updated 5 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 2 years ago
- TVM tutorial☆65Updated 5 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- This is a repo which contains some details about how to use OpenCL backend (Xilinx/Intel).☆24Updated 5 years ago
- Qualcomm Hexagon NN Offload Framework☆39Updated 4 years ago
- Evaluating different memory managers for dynamic GPU memory☆24Updated 3 years ago
- TFLite python API package for parsing TFLite model☆12Updated 4 years ago