geohot / tt-tinyLinks
tiny code to access tenstorrent blackhole
☆61Updated 8 months ago
Alternatives and similar repositories for tt-tiny
Users that are interested in tt-tiny are comparing it to the libraries listed below
Sorting:
- RDNA3 emulator☆55Updated 9 months ago
- Tensor library & inference framework for machine learning☆117Updated 4 months ago
- The Quasi Quantum Assembly Programming Language☆36Updated 2 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 9 months ago
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Custom PTX Instruction Benchmark☆138Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- SIMD quantization kernels☆94Updated 5 months ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆215Updated 2 years ago
- An implementation of delta-iris in tinygrad☆72Updated last year
- Standalone commandline CLI tool for compiling Triton kernels☆20Updated last year
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆417Updated last month
- A minimalistic C++ Jinja templating engine for LLM chat templates☆203Updated 4 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- C API for MLX☆172Updated last week
- ☆250Updated last year
- Tenstorrent console based hardware information program☆58Updated last week
- Train neural networks that distill into logic circuits, using JAX☆64Updated 8 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆220Updated last year
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆441Updated this week
- Learning about CUDA by writing PTX code.☆152Updated last year
- Cuq: A MIR-to-Coq Framework Targeting PTX for Formal Semantics and Verified Translation of Rust GPU Kernels☆124Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆71Updated this week
- parallelized hyperdimensional tictactoe☆126Updated last year
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- ☆466Updated 2 months ago
- An implementation of bucketMul LLM inference☆224Updated last year
- webgpu autograd library☆33Updated 8 months ago