MagellaX / StreamAttn
☆20Updated last month
Alternatives and similar repositories for StreamAttn:
Users that are interested in StreamAttn are comparing it to the libraries listed below
- Learning about CUDA by writing PTX code.☆125Updated last year
- Tensor library with autograd using only Rust's standard library☆66Updated 8 months ago
- Experimental GPU language with meta-programming☆22Updated 6 months ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 5 months ago
- This repository contain the simple llama3 implementation in pure jax.☆58Updated last month
- Rust Implementation of micrograd☆51Updated 8 months ago
- look how they massacred my boy☆63Updated 5 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆60Updated last week
- Extensive introductory writeup on Zig language functionalities☆10Updated 8 months ago
- ☆55Updated 3 weeks ago
- moondream in zig.☆58Updated this week
- A really tiny autograd engine☆90Updated 11 months ago
- In this repository I have a code and brief explanations of the attempts that I made at the ARC-AGI (2024) challenges :)☆23Updated 4 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆18Updated 10 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 4 months ago
- minimal diffusion transformer in pytorch.☆16Updated 5 months ago
- Gradient descent is cool and all, but what if we could delete it?☆103Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- An implementation of delta-iris in tinygrad☆72Updated 7 months ago
- Because it's there.☆15Updated 6 months ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated last month
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated last week
- realtime latent world model inference demo☆44Updated 4 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆110Updated 2 months ago
- Simple Transformer in Jax☆136Updated 9 months ago
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Updated last month
- ☆46Updated 7 months ago
- ☆44Updated 3 weeks ago
- ☆97Updated 5 months ago