naklecha / llm-inference-optimizations-explainedLinks

in this repository, i'm going to implement increasingly complex llm inference optimizations

☆68

Alternatives and similar repositories for llm-inference-optimizations-explained

Users that are interested in llm-inference-optimizations-explained are comparing it to the libraries listed below

Sorting:

VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆277Updated 3 weeks ago
kmohan321 / Research_Papers
☆46Updated 6 months ago
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆44Updated 9 months ago
joey00072 / Tinytorch
A really tiny autograd engine
☆95Updated 5 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆296Updated 2 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆87Updated last month
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆107Updated 7 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆132Updated this week
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
RiddleHe / llm-interp
A collection of lightweight interpretability scripts to understand how LLMs think
☆61Updated this week
Laz4rz / GPT-2
Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish
☆172Updated last year
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated 2 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 7 months ago
smolorg / smolgrad
small auto-grad engine inspired from Karpathy's micrograd and PyTorch
☆276Updated 11 months ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆144Updated last year
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆206Updated last week
Maharshi-Pandya / cudacodes
Learnings and programs related to CUDA
☆420Updated 3 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆84Updated 2 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆163Updated last week
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆297Updated 2 months ago
brendanhogan / picoDeepResearch
☆68Updated 5 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆103Updated 2 weeks ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
huggingface / picotron_tutorial
☆222Updated 3 weeks ago
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆103Updated 3 weeks ago
aryagxr / cuda
coding CUDA everyday!
☆64Updated 6 months ago
Quentin-Anthony / torch-profiling-tutorial
☆510Updated 2 months ago