google-deepmind / dks
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural network models (and their initializations) to make them easier to train.
☆64Updated last week
Related projects ⓘ
Alternatives and complementary repositories for dks
- Pytorch implementation of preconditioned stochastic gradient descent (affine group preconditioner, low-rank approximation preconditioner …☆127Updated last month
- ☆105Updated 2 weeks ago
- JAX implementation of Learning to learn by gradient descent by gradient descent☆26Updated last month
- ☆29Updated 3 weeks ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- A port of muP to JAX/Haiku☆25Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- Open source code for EigenGame.☆28Updated last year
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆60Updated 2 years ago
- JMP is a Mixed Precision library for JAX.☆187Updated 6 months ago
- A selection of neural network models ported from torchvision for JAX & Flax.☆44Updated 3 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).☆104Updated 2 years ago
- ☆58Updated 2 years ago
- Automatically take good care of your preemptible TPUs☆32Updated last year
- Neural Networks for JAX☆83Updated last month
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆79Updated 9 months ago
- ☆33Updated last year
- A GPT, made only of MLPs, in Jax☆55Updated 3 years ago
- Easy Hypernetworks in Pytorch and Jax☆96Updated last year
- ☆57Updated 2 years ago
- LoRA for arbitrary JAX models and functions☆132Updated 8 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 4 months ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- Pytorch-like dataloaders in JAX.☆59Updated last month
- Fast training of unitary deep network layers from low-rank updates☆28Updated last year
- ☆16Updated 2 months ago
- A functional training loops library for JAX☆85Updated 9 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 2 weeks ago