j4orz / compilersLinks
heterogenous compilers
☆15Updated 2 weeks ago
Alternatives and similar repositories for compilers
Users that are interested in compilers are comparing it to the libraries listed below
Sorting:
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆562Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆138Updated last year
- Learning about CUDA by writing PTX code.☆135Updated last year
- Solve puzzles to improve your tinygrad skills!☆142Updated 5 months ago
- Simple MPI implementation for prototyping or learning☆279Updated 3 weeks ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆189Updated last year
- Learnings and programs related to CUDA☆416Updated 2 months ago
- Tensor library with autograd using only Rust's standard library☆69Updated last year
- Tutorials on tinygrad☆406Updated 3 weeks ago
- SIMD quantization kernels☆86Updated this week
- Machine Learning with Symbolic Tensors☆328Updated 3 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆393Updated 5 months ago
- Simple Transformer in Jax☆140Updated last year
- Learn GPU Programming in Mojo🔥 by Solving Puzzles☆124Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆191Updated 3 months ago
- GPU documentation for humans☆131Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆51Updated this week
- ☆450Updated 10 months ago
- could we make an ml stack in 100,000 lines of code?☆46Updated last year
- High Quality Resources on GPU Programming/Architecture☆589Updated last year
- ☆89Updated last week
- Minimal yet performant LLM examples in pure JAX☆150Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆537Updated this week
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆377Updated 6 months ago
- ☆530Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆111Updated this week
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆483Updated 2 months ago
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆140Updated 4 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆276Updated 9 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆45Updated 5 months ago