tiny-tpu-v2 / tiny-tpuLinks
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
☆953Updated last month
Alternatives and similar repositories for tiny-tpu
Users that are interested in tiny-tpu are comparing it to the libraries listed below
Sorting:
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆706Updated 7 years ago
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆184Updated last year
- Run 64-bit Linux on LiteX + RocketChip☆203Updated 2 months ago
- ☆290Updated last week
- Tenstorrent TT-BUDA Repository☆316Updated 6 months ago
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆369Updated last week
- GPU documentation for humans☆313Updated last week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆502Updated 3 weeks ago
- Nvidia Instruction Set Specification Generator☆293Updated last year
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆162Updated last year
- Machine-Learning Accelerator System Exploration Tools☆178Updated last week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,217Updated last week
- Algebraic enhancements for GEMM & AI accelerators☆280Updated 7 months ago
- ☆105Updated last year
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆166Updated last year
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆118Updated last week
- Tenstorrent MLIR compiler☆186Updated this week
- Open source machine learning accelerators☆388Updated last year
- kernels, of the mega variety☆563Updated last week
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆355Updated 5 months ago
- Ocelot: The Berkeley Out-of-Order Machine With V-EXT support☆180Updated last week
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,092Updated 10 months ago
- Visualization of cache-optimized matrix multiplication☆155Updated 6 months ago
- Allo: A Programming Model for Composable Accelerator Design☆283Updated this week
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆533Updated 3 months ago
- Exocompilation for productive programming of hardware accelerators☆667Updated this week
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆213Updated last year
- Learning about CUDA by writing PTX code.☆138Updated last year
- ☆78Updated 2 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆592Updated this week