zqOuO / GWTView external linksLinks
☆13Jan 15, 2025Updated last year
Alternatives and similar repositories for GWT
Users that are interested in GWT are comparing it to the libraries listed below
Sorting:
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)☆17Mar 6, 2025Updated 11 months ago
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆17Dec 17, 2025Updated last month
- ☆35Mar 12, 2025Updated 11 months ago
- This repository contains code for the MicroAdam paper.☆22Dec 14, 2024Updated last year
- Testing various improvements to Ranger21 for 2022☆19Nov 6, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆33Jul 1, 2025Updated 7 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 4 months ago
- ☆27Aug 25, 2023Updated 2 years ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?☆119Oct 21, 2024Updated last year
- ☆54Jul 7, 2025Updated 7 months ago
- Work in progress.☆79Nov 25, 2025Updated 2 months ago
- ☆31Nov 11, 2024Updated last year
- Preparing for ML Interviews.☆53Jan 12, 2026Updated last month
- ☆10Apr 5, 2024Updated last year
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention☆270Nov 29, 2025Updated 2 months ago
- [ICLR 2025] AdaFisher: Adaptive Second Order Optimization via Fisher Information☆51Feb 7, 2025Updated last year
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- This is the code of a agentic rag method with dynamic workflow.☆13Jan 22, 2026Updated 3 weeks ago
- Efficient misspecification uncertainties for linear regression☆16Feb 3, 2026Updated last week
- ☆10Sep 29, 2024Updated last year
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 5 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- Source code for paper "Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration" of NeurIPS 2019☆10Jan 25, 2024Updated 2 years ago
- ☆63Jul 10, 2025Updated 7 months ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆21Aug 11, 2025Updated 6 months ago
- Aline: Agentic Git for Vibe Coders☆27Nov 26, 2025Updated 2 months ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform☆44Apr 26, 2019Updated 6 years ago
- ☆18Mar 2, 2025Updated 11 months ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 7 months ago
- Reproduction of "Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization" for the Reproducibility challenge@NeurIPS…☆11Jan 14, 2020Updated 6 years ago
- Brax + Pufferlib + CARBS for gpu-accelerated robotics RL☆12Jun 12, 2025Updated 8 months ago
- ☆11Jun 12, 2024Updated last year
- DINO-based perceptual losses and FDD feature extraction☆24Jan 7, 2026Updated last month
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago