iShohei220 / adopt
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"
☆227Updated this week
Related projects ⓘ
Alternatives and complementary repositories for adopt
- For optimization algorithm research and development.☆417Updated this week
- The AdEMAMix Optimizer: Better, Faster, Older.☆171Updated 2 months ago
- ☆292Updated 4 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆118Updated 3 months ago
- ☆122Updated this week
- TensorHue is a Python library that allows you to visualize tensors right in your console, making understanding and debugging tensor conte…☆106Updated last month
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆326Updated last month
- ☆139Updated 2 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- 94% on CIFAR-10 in 2.59 seconds 💨 96% in 27 seconds☆168Updated this week
- Implementation of the proposed minGRU in Pytorch☆229Updated 3 weeks ago
- A simple implimentation of Bayesian Flow Networks (BFN)☆239Updated 10 months ago
- Library for Jacobian descent with PyTorch. It enables optimization of neural networks with multiple losses (e.g. multi-task learning).☆153Updated this week
- Scalable neural net training via automatic normalization in the modular norm.☆119Updated 2 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆84Updated 2 months ago
- WIP☆89Updated 2 months ago
- Annotated version of the Mamba paper☆455Updated 8 months ago
- Efficient optimizers☆58Updated this week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆242Updated this week
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆94Updated 2 weeks ago
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆247Updated 2 years ago
- A repository for log-time feedforward networks☆216Updated 7 months ago
- A Jax-based library for designing and training transformer models from scratch.☆276Updated 2 months ago
- Unofficial JAX implementations of deep learning research papers☆151Updated 2 years ago
- ☆131Updated last year
- ☆76Updated 6 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆156Updated 10 months ago
- Run PyTorch in JAX. 🤝☆199Updated last year
- Train VAE like a boss☆242Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆112Updated 6 months ago