tiny-tpu-v2 / tiny-tpuLinks
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
☆905Updated 3 weeks ago
Alternatives and similar repositories for tiny-tpu
Users that are interested in tiny-tpu are comparing it to the libraries listed below
Sorting:
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆705Updated 7 years ago
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆183Updated last year
- Run 64-bit Linux on LiteX + RocketChip☆201Updated last month
- Nvidia Instruction Set Specification Generator☆293Updated last year
- ☆287Updated last week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,118Updated last week
- GPU documentation for humans☆227Updated last week
- Machine-Learning Accelerator System Exploration Tools☆175Updated 3 months ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆165Updated last year
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆161Updated last year
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆342Updated this week
- ☆102Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆279Updated 6 months ago
- Tenstorrent TT-BUDA Repository☆316Updated 5 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆487Updated this week
- Exocompilation for productive programming of hardware accelerators☆657Updated last week
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,082Updated 9 months ago
- Open source machine learning accelerators☆388Updated last year
- Visualization of cache-optimized matrix multiplication☆155Updated 6 months ago
- Allo: A Programming Model for Composable Accelerator Design☆276Updated this week
- Ocelot: The Berkeley Out-of-Order Machine With V-EXT support☆176Updated 3 weeks ago
- Tenstorrent MLIR compiler☆183Updated this week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆142Updated last month
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆213Updated last year
- Custom PTX Instruction Benchmark☆126Updated 6 months ago
- ☆450Updated 5 months ago
- Learning about CUDA by writing PTX code.☆135Updated last year
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆480Updated this week
- Tensor library & inference framework for machine learning☆110Updated 2 weeks ago
- ☆413Updated 3 weeks ago