j4orz / compilersLinks

heterogenous compilers

☆15

Alternatives and similar repositories for compilers

Users that are interested in compilers are comparing it to the libraries listed below

Sorting:

jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆562Updated last week
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆138Updated last year
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆135Updated last year
obadakhalili / tinygrad-tensor-puzzles
Solve puzzles to improve your tinygrad skills!
☆142Updated 5 months ago
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆279Updated 3 weeks ago
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆189Updated last year
Maharshi-Pandya / cudacodes
Learnings and programs related to CUDA
☆416Updated 2 months ago
nreHieW / r-nn
Tensor library with autograd using only Rust's standard library
☆69Updated last year
mesozoic-egg / tinygrad-notes
Tutorials on tinygrad
☆406Updated 3 weeks ago
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆86Updated this week
thomasahle / tensorgrad
Machine Learning with Symbolic Tensors
☆328Updated 3 months ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆393Updated 5 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆140Updated last year
modular / mojo-gpu-puzzles
Learn GPU Programming in Mojo🔥 by Solving Puzzles
☆124Updated last week
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆191Updated 3 months ago
modal-labs / gpu-glossary
GPU documentation for humans
☆131Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆51Updated this week
srush / Autodiff-Puzzles
☆450Updated 10 months ago
spikedoanz / from-bits-to-intelligence
could we make an ml stack in 100,000 lines of code?
☆46Updated last year
arpitingle / gpu-alpha
High Quality Resources on GPU Programming/Architecture
☆589Updated last year
geohotstan / tinycorp-meetings
☆89Updated last week
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆150Updated last week
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆537Updated this week
rkinas / cuda-learning
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…
☆377Updated 6 months ago
rwitten / HighPerfLLMs2024
☆530Updated last year
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆111Updated this week
tugot17 / pmpp
Complete solutions to the Programming Massively Parallel Processors Edition 4
☆483Updated 2 months ago
nebius / kvax
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆140Updated 4 months ago
smolorg / smolgrad
small auto-grad engine inspired from Karpathy's micrograd and PyTorch
☆276Updated 9 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 5 months ago