google / drjaxLinks

☆12

Alternatives and similar repositories for drjax

Users that are interested in drjax are comparing it to the libraries listed below

Sorting:

edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 2 months ago
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
shikaiqiu / compute-better-spent
☆53Updated 10 months ago
google-deepmind / dks
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…
☆71Updated last month
HazyResearch / train-tk
train with kittens!
☆62Updated 9 months ago
AndPotap / einsum-search
☆32Updated 10 months ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆36Updated 2 years ago
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 2 months ago
cloneofsimo / min-fsdp
☆83Updated last year
yixiaoer / tpu-training-example
☆14Updated last year
srush / drop7
☆18Updated last year
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆36Updated 9 months ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated 9 months ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 2 months ago
ethansmith2000 / TransformerExperiments
☆19Updated 2 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 8 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆179Updated last week
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆35Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 5 months ago
main-horse / hnet
H-Net Dynamic Hierarchical Architecture
☆65Updated 2 weeks ago
KhoomeiK / complexity-scaling
gzip Predicts Data-dependent Scaling Laws
☆35Updated last year
srush / Tensor-Puzzles-Penzai
☆21Updated last year
cgarciae / einop
☆60Updated 3 years ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆85Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year