vicharak-in / Axon-NPU-GuideLinks
This repository contains guide on how to setup toolkits to use NPU present on Axon for running various CNN models
☆22Updated 3 months ago
Alternatives and similar repositories for Axon-NPU-Guide
Users that are interested in Axon-NPU-Guide are comparing it to the libraries listed below
Sorting:
- NeuroBLAST v3 architecture code☆36Updated 3 weeks ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆53Updated 10 months ago
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Updated 11 months ago
- Inference Llama 2 in C++☆43Updated last year
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Updated 7 months ago
- ☆50Updated last month
- implementing dl from scratch using first principles☆25Updated 3 weeks ago
- A lightweight Python-based GPU architecture simulator that demonstrates how parallel threads, registers, memory, and instructions work on…☆39Updated last week
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆64Updated 8 months ago
- Andrej Kapathy's micrograd implemented in c☆30Updated last year
- pytorch from scratch in pure C/CUDA and python☆40Updated last year
- ~950 line, minimal, extensible LLM inference engine built from scratch.☆405Updated 3 weeks ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Updated 4 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆53Updated last year
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆52Updated last year
- Neural Networks with low bit weights on low end 32 bit microcontrollers such as the CH32V003 RISC-V Microcontroller and others☆310Updated 3 weeks ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆94Updated this week
- ☆90Updated last month
- High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, …☆140Updated last month
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆70Updated 9 months ago
- ☆42Updated last year
- Tenstorrent console based hardware information program☆58Updated this week
- ☆20Updated 3 months ago
- ☆94Updated last year
- The Official Nimbus SDK☆210Updated this week
- ☆38Updated 11 months ago
- Code for paper https://arxiv.org/abs/2501.00522☆14Updated 9 months ago
- TT-Studio : An all-in-one platform to deploy and manage AI models optimized for Tenstorrent hardware with dedicated front-end demo applic…☆39Updated last week
- EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU☆50Updated last year
- Model compression for ONNX☆99Updated last year