cloneofsimo / zeroshampoo
☆33Updated 4 months ago
Alternatives and similar repositories for zeroshampoo:
Users that are interested in zeroshampoo are comparing it to the libraries listed below
- Utilities for PyTorch distributed☆23Updated last year
- ☆19Updated 3 months ago
- Latent Diffusion Language Models☆68Updated last year
- ☆53Updated 11 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆21Updated 2 weeks ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆73Updated 5 months ago
- ☆51Updated last year
- Automatically take good care of your preemptible TPUs☆34Updated last year
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆17Updated 2 months ago
- A JAX implementation of the continuous time formulation of Consistency Models☆84Updated last year
- ☆75Updated 6 months ago
- FID computation in Jax/Flax.☆26Updated 6 months ago
- PyTorch interface for TrueGrad Optimizers☆41Updated last year
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆26Updated this week
- ☆50Updated 3 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Focused on fast experimentation and simplicity☆64Updated 3 weeks ago
- ☆21Updated 6 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆91Updated 4 months ago
- ☆22Updated 2 months ago
- WIP☆92Updated 5 months ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆31Updated last year
- ☆31Updated 2 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆53Updated 8 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- The 2D discrete wavelet transform for JAX☆40Updated last year
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆59Updated 2 years ago