ACANETS / eece-6540-labs
labs and exercises for EECE.6540 Heterogeneous Computing at UMass Lowell
☆13Updated last year
Alternatives and similar repositories for eece-6540-labs:
Users that are interested in eece-6540-labs are comparing it to the libraries listed below
- Accelerating CNN's convolution operation on GPUs by using memory-efficient data access patterns.☆14Updated 7 years ago
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- Accelerate convolution neural network for face recognition using GPU☆12Updated 4 years ago
- BiSUNA framework specialized to compile for the Xilinx Alveo U50☆13Updated 4 years ago
- Learn NVDLA by SOMNIA☆32Updated 5 years ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆16Updated 2 years ago
- ☆30Updated last year
- TensorCore Vector Processor for Deep Learning - Google Summer of Code Project☆21Updated 3 years ago
- Express DLA implementation for FPGA, revised based on NVDLA.☆9Updated 5 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Updated 5 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 3 years ago
- A 8-/16-/32-/64-bit floating point number family☆17Updated 3 years ago
- ☆10Updated 6 months ago
- ☆18Updated 2 years ago
- Sandbox for TVM and playing around!☆22Updated 2 years ago
- Benchmark scripts for TVM☆73Updated 2 years ago
- ☆23Updated 3 years ago
- study of Ampere' Sparse Matmul☆16Updated 4 years ago
- [FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Desi…☆25Updated 2 years ago
- ☆15Updated 3 years ago
- The code for paper: Neuralpower: Predict and deploy energy-efficient convolutional neural networks☆21Updated 5 years ago
- Provides the code for the paper "EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators" by Luk…☆17Updated 5 years ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆13Updated 3 years ago
- ☆19Updated 4 months ago
- The code for Joint Neural Architecture Search and Quantization☆13Updated 5 years ago
- The code for our paper "Neural Architecture Search as Program Transformation Exploration"☆18Updated 3 years ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆83Updated last year
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆10Updated last year