Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"
☆56Jan 27, 2025Updated last year
Alternatives and similar repositories for SGD_SaI
Users that are interested in SGD_SaI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆35Mar 12, 2025Updated last year
- [ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry☆407Sep 26, 2025Updated 6 months ago
- ☆13Dec 12, 2025Updated 3 months ago
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening☆70May 18, 2025Updated 10 months ago
- Sparse Backpropagation for Mixture-of-Expert Training☆29Jul 2, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 8 months ago
- ☆11Sep 20, 2024Updated last year
- ☆25Dec 13, 2024Updated last year
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 10 months ago
- Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM | EMNLP 2025 Findings☆19Oct 17, 2025Updated 5 months ago
- The official repo for the DanQing dataset.☆32Jan 16, 2026Updated 2 months ago
- Legacy LoRA Trainer that work on T4 GPU Colab for SDXL Model☆23Oct 18, 2025Updated 5 months ago
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 8 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆15Oct 4, 2024Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 9 months ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆128Dec 29, 2025Updated 2 months ago
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆15Jul 15, 2025Updated 8 months ago
- A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )☆26Aug 27, 2019Updated 6 years ago
- 0-Shot Tokenizer Transplant☆14May 16, 2025Updated 10 months ago
- ☆36Oct 7, 2023Updated 2 years ago
- [ICCV 2025] Official implementation of "What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?"☆19Aug 7, 2025Updated 7 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆15Sep 22, 2024Updated last year
- ☆15Mar 2, 2025Updated last year
- ☆19Oct 14, 2024Updated last year
- ☆33Apr 22, 2025Updated 11 months ago
- Fork of Flame repo for training of some new stuff in development☆19Mar 17, 2026Updated last week
- Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".☆74Jun 23, 2025Updated 9 months ago
- [ICML2025] LoRA fine-tune directly on the quantized models.☆39Nov 25, 2024Updated last year
- ☆15Mar 20, 2025Updated last year
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆282Jan 16, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- All You Need to Know About Image Retrieval: a repo to automagically download datasets and run experiments☆63Mar 18, 2025Updated last year
- TPDiff: Temporal Pyramid Video Diffusion Model☆25Mar 13, 2025Updated last year
- [IJCAI 2024] Official implementation of the paper "Integrating View Conditions for Image Synthesis"☆25Aug 27, 2024Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Sep 12, 2024Updated last year
- ☆22Nov 9, 2024Updated last year
- ☆13Dec 22, 2023Updated 2 years ago
- ☆15Jan 12, 2026Updated 2 months ago