tiny-tpu-v2 / tiny-tpuLinks
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
☆967Updated 2 months ago
Alternatives and similar repositories for tiny-tpu
Users that are interested in tiny-tpu are comparing it to the libraries listed below
Sorting:
- A open source reimplementation of Google's Tensor Processing Unit (TPU).☆706Updated 7 years ago
- A machine learning accelerator core designed for energy-efficient AI at the edge.☆1,530Updated this week
- Run 64-bit Linux on LiteX + RocketChip☆202Updated 2 weeks ago
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆185Updated last year
- ☆291Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆381Updated last week
- Nvidia Instruction Set Specification Generator☆297Updated last year
- GPU documentation for humans☆347Updated 3 weeks ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆518Updated last month
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆166Updated last year
- Tenstorrent TT-BUDA Repository☆316Updated 6 months ago
- Machine-Learning Accelerator System Exploration Tools☆179Updated 3 weeks ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆167Updated last year
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆126Updated last week
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,106Updated 11 months ago
- A tiny CPU simulator written in Python☆320Updated this week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,233Updated last week
- Tenstorrent MLIR compiler☆199Updated this week
- ☆107Updated last year
- kernels, of the mega variety☆587Updated 3 weeks ago
- ☆443Updated 2 months ago
- Quantized LLM training in pure CUDA/C++.☆206Updated last week
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Updated 4 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆164Updated this week
- Visualization of cache-optimized matrix multiplication☆155Updated 7 months ago
- Simple MPI implementation for prototyping or learning☆286Updated 2 months ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆214Updated last year
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆512Updated this week
- Ocelot: The Berkeley Out-of-Order Machine With V-EXT support☆188Updated this week
- Learning about CUDA by writing PTX code.☆145Updated last year