tinygrad / tinyosLinks
☆42Updated this week
Alternatives and similar repositories for tinyos
Users that are interested in tinyos are comparing it to the libraries listed below
Sorting:
- SIMD quantization kernels☆93Updated 3 months ago
- Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.☆63Updated last year
- An implementation of delta-iris in tinygrad☆72Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 3 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- tiny code to access tenstorrent blackhole☆61Updated 6 months ago
- Quantized LLM training in pure CUDA/C++.☆221Updated this week
- Learning about CUDA by writing PTX code.☆149Updated last year
- ☆22Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 8 months ago
- Can RL solve simple problems?☆54Updated last year
- Hand-Rolled GPU communications library☆73Updated 2 weeks ago
- Modded vLLM to run pipeline parallelism over public networks☆40Updated 6 months ago
- 👷 Build compute kernels☆192Updated this week
- Custom PTX Instruction Benchmark☆136Updated 9 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆64Updated last week
- Solidity contracts for the decentralized Prime Network protocol☆27Updated 5 months ago
- Simple Transformer in Jax☆139Updated last year
- A really tiny autograd engine☆96Updated 6 months ago
- Tensor library with autograd using only Rust's standard library☆70Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆108Updated 9 months ago
- Solve puzzles to improve your tinygrad skills!☆164Updated last month
- look how they massacred my boy☆63Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆142Updated last week
- noise_step: Training in 1.58b With No Gradient Memory☆221Updated 11 months ago
- ☆18Updated last month
- ☆138Updated last year
- ☆13Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆120Updated 2 months ago
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago