ROCm / tensorcastLinks
☆15Updated 3 weeks ago
Alternatives and similar repositories for tensorcast
Users that are interested in tensorcast are comparing it to the libraries listed below
Sorting:
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆324Updated 5 months ago
- IREE plugin repository for the AMD AIE accelerator☆113Updated last week
- ☆119Updated last week
- ☆109Updated last year
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆112Updated last year
- ☆166Updated 2 years ago
- Dissecting NVIDIA GPU Architecture☆112Updated 3 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆141Updated 2 years ago
- Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.☆431Updated 2 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆102Updated this week
- ☆159Updated this week
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆180Updated 3 years ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆117Updated 3 years ago
- ☆46Updated 5 months ago
- Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"☆24Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- A Winograd Minimal Filter Implementation in CUDA☆28Updated 4 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆55Updated 2 years ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆166Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆234Updated 3 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆122Updated 3 weeks ago
- ☆47Updated 4 years ago
- OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library☆25Updated 5 years ago
- The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their per…☆52Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Updated 2 years ago
- ☆19Updated 3 years ago
- Small set of gdb commands for useful tasks in tvm☆22Updated 4 months ago
- ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference☆171Updated this week
- ☆36Updated 3 years ago
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆60Updated last month