TorchJD / torchjd
Library for Jacobian descent with PyTorch. It enables optimization of neural networks with multiple losses (e.g. multi-task learning).
☆150Updated this week
Related projects ⓘ
Alternatives and complementary repositories for torchjd
- ☆138Updated 2 months ago
- ☆116Updated this week
- A repository for log-time feedforward networks☆216Updated 7 months ago
- TensorHue is a Python library that allows you to visualize tensors right in your console, making understanding and debugging tensor conte…☆106Updated last month
- A simple implimentation of Bayesian Flow Networks (BFN)☆238Updated 10 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆170Updated last month
- Scalable neural net training via automatic normalization in the modular norm.☆118Updated 2 months ago
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆65Updated this week
- WIP☆88Updated 2 months ago
- For optimization algorithm research and development.☆408Updated this week
- Annotated version of the Mamba paper☆455Updated 8 months ago
- Easy Hypernetworks in Pytorch and Jax☆95Updated last year
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆117Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆234Updated this week
- ☆46Updated last month
- Implementation of the proposed minGRU in Pytorch☆228Updated 2 weeks ago
- ☆292Updated 4 months ago
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆326Updated 3 weeks ago
- ☆76Updated 6 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- Efficient optimizers☆42Updated this week
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).☆173Updated this week
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 4 months ago
- ☆46Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆77Updated 9 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆84Updated 2 months ago
- ☆197Updated 3 months ago