geohot / tt-tinyLinks
tiny code to access tenstorrent blackhole
☆61Updated 7 months ago
Alternatives and similar repositories for tt-tiny
Users that are interested in tt-tiny are comparing it to the libraries listed below
Sorting:
- RDNA3 emulator☆55Updated 9 months ago
- Tensor library & inference framework for machine learning☆118Updated 3 months ago
- SIMD quantization kernels☆93Updated 4 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 8 months ago
- ☆43Updated 3 weeks ago
- An implementation of delta-iris in tinygrad☆72Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆318Updated last week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 9 months ago
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- The Quasi Quantum Assembly Programming Language☆36Updated 2 months ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆170Updated last year
- Tenstorrent console based hardware information program☆58Updated this week
- ☆87Updated 2 weeks ago
- [Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path☆61Updated 2 weeks ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- ☆250Updated last year
- Fast and Furious AMD Kernels☆336Updated this week
- ☆153Updated 2 weeks ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆215Updated 2 years ago
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆773Updated this week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆200Updated 3 months ago
- Learning about CUDA by writing PTX code.☆151Updated last year
- C API for MLX☆159Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated last week
- FP4 MAC Array☆19Updated last year
- LLM training in simple, raw C/Metal Shading Language☆61Updated last year
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆439Updated last month
- Because it's there.☆16Updated last year