divyamakkar0 / JAXformerLinks

A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.

☆105

Alternatives and similar repositories for JAXformer

Users that are interested in JAXformer are comparing it to the libraries listed below

Sorting:

MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
google-deepmind / nanodo
☆285Updated last year
microsoft / dion
Dion optimizer algorithm
☆395Updated 2 weeks ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆204Updated 2 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
LeonGuertler / UnstableBaselines
☆106Updated last month
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated last year
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 8 months ago
cloneofsimo / min-fsdp
☆91Updated last year
huggingface / picotron_tutorial
☆224Updated last week
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆70Updated 6 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆220Updated this week
dshah3 / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆64Updated last year
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆182Updated this week
modula-systems / modula
🧱 Modula software package
☆307Updated 3 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆281Updated 2 months ago
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆325Updated this week
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
imbue-ai / carbs
Cost aware hyperparameter tuning algorithm
☆175Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
nebius / kvax
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆149Updated 3 weeks ago