google-deepmind / dksLinks

Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural network models (and their initializations) to make them easier to train.

☆74

Alternatives and similar repositories for dks

Users that are interested in dks are comparing it to the libraries listed below

Sorting:

lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated 2 weeks ago
dylandoblar / noether-networks
Meta-learning inductive biases in the form of useful conserved quantities.
☆37Updated 2 years ago
aks2203 / easy-to-hard
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"
☆59Updated 3 years ago
google-deepmind / tf2jax
☆115Updated last month
google-deepmind / eigengame
Open source code for EigenGame.
☆33Updated 2 years ago
shikaiqiu / compute-better-spent
☆58Updated last year
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 9 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
mgrankin / minGPT
minGPT in JAX
☆48Updated 3 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆69Updated 2 years ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
cgarciae / einop
☆60Updated 3 years ago
edwardjhu / TP4
Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)
☆63Updated 4 years ago
cgarciae / nnx
Neural Networks for JAX
☆84Updated last year
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆45Updated 2 years ago
bhoov / energy-transformer-jax
The Energy Transformer block, in JAX
☆58Updated last year
lucidrains / mlp-gpt-jax
A GPT, made only of MLPs, in Jax
☆58Updated 4 years ago
crowsonkb / dice-mc
DiCE: The Infinitely Differentiable Monte-Carlo Estimator
☆32Updated 2 years ago
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated last year
AhmedImtiazPrio / grok-adversarial
Deep Networks Grok All the Time and Here is Why
☆37Updated last year
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
nestordemeure / flaxOptimizers
A collection of optimizers, some arcane others well known, for Flax.
☆29Updated 4 years ago
gisilvs / AEF
☆33Updated 2 years ago
HomebrewML / TrueGrad
PyTorch interface for TrueGrad Optimizers
☆43Updated 2 years ago
matthias-wright / jax-fid
FID computation in Jax/Flax.
☆28Updated last year
google-deepmind / neural_testbed
☆192Updated 4 months ago
google-research / precondition
☆31Updated 4 months ago