☆13Jan 15, 2025Updated last year
Alternatives and similar repositories for GWT
Users that are interested in GWT are comparing it to the libraries listed below
Sorting:
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)☆17Mar 6, 2025Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆17Dec 17, 2025Updated 2 months ago
- ☆34Mar 12, 2025Updated 11 months ago
- This repository contains code for the MicroAdam paper.☆21Dec 14, 2024Updated last year
- Testing various improvements to Ranger21 for 2022☆19Nov 6, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆33Jul 1, 2025Updated 8 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- ☆27Aug 25, 2023Updated 2 years ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?☆118Oct 21, 2024Updated last year
- ☆55Jul 7, 2025Updated 8 months ago
- Work in progress.☆79Nov 25, 2025Updated 3 months ago
- ☆32Nov 11, 2024Updated last year
- Preparing for ML Interviews.☆54Jan 12, 2026Updated last month
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- ☆10Apr 5, 2024Updated last year
- APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention☆271Nov 29, 2025Updated 3 months ago
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- [ICLR 2025] AdaFisher: Adaptive Second Order Optimization via Fisher Information☆51Feb 7, 2025Updated last year
- Source code for paper "Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration" of NeurIPS 2019☆10Jan 25, 2024Updated 2 years ago
- Efficient misspecification uncertainties for linear regression☆16Feb 19, 2026Updated 2 weeks ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- This is the code of a agentic rag method with dynamic workflow.☆12Jan 22, 2026Updated last month
- ☆10Sep 29, 2024Updated last year
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆63Jul 10, 2025Updated 7 months ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆19Mar 2, 2025Updated last year
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform☆44Apr 26, 2019Updated 6 years ago
- ☆12Mar 1, 2025Updated last year
- DINO-based perceptual losses and FDD feature extraction☆25Jan 7, 2026Updated 2 months ago
- Code for "Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling" [ICML 2021]☆10Mar 14, 2022Updated 3 years ago
- ☆11Jun 12, 2024Updated last year
- Explanation Optimization☆13Oct 16, 2020Updated 5 years ago
- Brax + Pufferlib + CARBS for gpu-accelerated robotics RL☆12Jun 12, 2025Updated 8 months ago
- Building the Bi-LSTM & the CNN-GAN models to compose Classical Music in different eras☆11Aug 2, 2021Updated 4 years ago
- This tool displays tflite signatures and rewrites the input/output OP name to the name of the signature. There is no need to install Tens…☆14Dec 13, 2023Updated 2 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 8 months ago