EIFY / mup-vitLinks
Everything you need to reproduce "Better plain ViT baselines for ImageNet-1k" in PyTorch, and more
☆12Updated last week
Alternatives and similar repositories for mup-vit
Users that are interested in mup-vit are comparing it to the libraries listed below
Sorting:
- FID computation in Jax/Flax.☆29Updated last year
- Automatically take good care of your preemptible TPUs☆37Updated 2 years ago
- Train vision models using JAX and 🤗 transformers☆100Updated last month
- Python library for argument and configuration management☆56Updated 3 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆86Updated last year
- PyTorch interface for TrueGrad Optimizers☆43Updated 2 years ago
- ☆92Updated last year
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- ☆34Updated last year
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆37Updated 3 years ago
- ☆91Updated 3 years ago
- Focused on fast experimentation and simplicity☆80Updated last year
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆82Updated 4 years ago
- Easy Hypernetworks in Pytorch and Jax☆106Updated 3 years ago
- CLOOB training (JAX) and inference (JAX and PyTorch)☆74Updated 3 years ago
- gpu tester detects broken and slow gpus in a cluster☆72Updated 2 years ago
- These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning☆48Updated 2 years ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆25Updated 3 years ago
- Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'☆39Updated 4 years ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated 3 years ago
- Contrastive Language-Image Pretraining☆144Updated 3 years ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 8 months ago
- A case study of efficient training of large language models using commodity hardware.☆68Updated 3 years ago
- Utilities for PyTorch distributed☆25Updated 11 months ago
- Utilities for Training Very Large Models☆58Updated last year
- ☆53Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Updated 2 years ago
- ☆23Updated last year
- An open source implementation of CLIP.☆33Updated 3 years ago