tiny-tpu-v2 / tiny-tpuLinks
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
☆467Updated this week
Alternatives and similar repositories for tiny-tpu
Users that are interested in tiny-tpu are comparing it to the libraries listed below
Sorting:
- Algebraic enhancements for GEMM & AI accelerators☆278Updated 5 months ago
- Run 64-bit Linux on LiteX + RocketChip☆201Updated 3 weeks ago
- Tensor library & inference framework for machine learning☆107Updated this week
- Tilus is a tile-level kernel programming language, implemented in Python.☆115Updated 2 weeks ago
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆167Updated last year
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆697Updated 7 years ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆211Updated last year
- Nvidia Instruction Set Specification Generator☆289Updated last year
- ☆248Updated last year
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆165Updated last year
- A configurable RTL to bitstream FPGA toolchain☆41Updated this week
- ☆197Updated 3 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated 4 months ago
- Custom PTX Instruction Benchmark☆126Updated 5 months ago
- Tenstorrent TT-BUDA Repository☆315Updated 4 months ago
- Exocompilation for productive programming of hardware accelerators☆654Updated this week
- tiny code to access tenstorrent blackhole☆59Updated 2 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆159Updated last year
- ☆401Updated this week
- ☆187Updated 11 months ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆126Updated 4 months ago
- ☆284Updated this week
- ☆1,048Updated 3 months ago
- ☆449Updated 4 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆194Updated 9 months ago
- GPU documentation for humans☆119Updated this week
- Learning about CUDA by writing PTX code.☆134Updated last year
- Code sample showing how to run and benchmark models on Qualcomm's Window PCs☆101Updated 10 months ago
- An rv32i inspired ISA, SIMT GPU implementation in system-verilog.☆199Updated 6 months ago
- Visualization of cache-optimized matrix multiplication☆155Updated 5 months ago