An implementation of PSGD Kron second-order optimizer for PyTorch
☆98Jul 24, 2025Updated 7 months ago
Alternatives and similar repositories for kron_torch
Users that are interested in kron_torch are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆190Jan 11, 2026Updated last month
- Efficient optimizers☆285Dec 20, 2025Updated 2 months ago
- ☆29Sep 30, 2025Updated 5 months ago
- ☆27May 3, 2024Updated last year
- Implementation of PSGD optimizer in JAX☆35Dec 31, 2024Updated last year
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 9 months ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- ☆253Dec 2, 2024Updated last year
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Oct 7, 2024Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- 🧱 Modula software package☆324Aug 18, 2025Updated 6 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Jul 24, 2025Updated 7 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆21Jan 8, 2025Updated last year
- ☆34Sep 10, 2024Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Nov 22, 2024Updated last year
- ☆20May 30, 2024Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- ☆21Sep 6, 2021Updated 4 years ago
- ☆21Apr 13, 2024Updated last year
- [NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Underst…☆23Mar 16, 2025Updated 11 months ago
- Official PyTorch implementation of TokenSet.