AnonymousAlethiometer/SGD_SaI

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AnonymousAlethiometer/SGD_SaI)

AnonymousAlethiometer / SGD_SaI

Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"

☆55

Alternatives and similar repositories for SGD_SaI

Users that are interested in SGD_SaI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyleliang919 / C-Optim
View on GitHub
[ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry
☆417Sep 26, 2025Updated 10 months ago
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
Gen-Verse / Diffusion-Sharpening
View on GitHub
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
☆72May 18, 2025Updated last year
microsoft / SparseMixer
View on GitHub
Sparse Backpropagation for Mixture-of-Expert Training
☆30Jul 2, 2024Updated 2 years ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
CUC-MIPG / UnifyEdit
View on GitHub
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
☆13Dec 29, 2024Updated last year
MLLM-Data-Contamination / MM-Detect
View on GitHub
Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM | EMNLP 2025 Findings
☆18Oct 17, 2025Updated 9 months ago
OpenMOSS / rope_pp
View on GitHub
[ICLR26] Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
☆33Dec 9, 2025Updated 7 months ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
Infini-AI-Lab / S2FT
View on GitHub
☆19Jan 3, 2025Updated last year
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated last year
RTkenny / RiskPO
View on GitHub
Official implementation of 'RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training', accepted by ICLR 2026
☆18Oct 15, 2025Updated 9 months ago
Rishit-dagli / Squeeze3D
View on GitHub
Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor
☆23Jun 12, 2025Updated last year
assafbk / OPRM
View on GitHub
Overflow Prevention Enhances Long-Context Recurrent LLMs (COLM 2025)
☆18Jul 8, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
NadavSc / Diff-Mamba
View on GitHub
☆22Jan 23, 2026Updated 6 months ago
webis-de / set-encoder
View on GitHub
Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders
☆19May 23, 2025Updated last year
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
AgenticIR-Lab / OThink-R1
View on GitHub
This is the official code for OThink-R1 project.
☆21Jun 19, 2025Updated last year
xie-lab-ml / CoRe2
View on GitHub
[TPAMI] The official implementation of our paper "Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine".
☆30Mar 8, 2026Updated 4 months ago
roebel / MBExWN_Vocoder
View on GitHub
The Multi-band Excited WaveNet
☆17Feb 2, 2023Updated 3 years ago
DeadAt0m / adafactor-pytorch
View on GitHub
A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )
☆26Aug 27, 2019Updated 6 years ago
kinoshitadaisuke / ncu_astroinformatics_202209
View on GitHub
The repository for the course "Astroinformatics" offered at Institute of Astronomy, National Central University, from Sep/2022 to Jan/202…
☆10Jun 4, 2024Updated 2 years ago
VITA-Group / Neon
View on GitHub
[ICLR 2026 Oral] Neon: Negative Extrapolation From Self-Training Improves Image Generation
☆25Oct 7, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
camenduru / Multi-LoRA-Composition-jupyter
View on GitHub
☆13Feb 28, 2024Updated 2 years ago
DEX-1101 / kohya-trainer
View on GitHub
Legacy LoRA Trainer that work on T4 GPU Colab for SDXL Model
☆25Oct 18, 2025Updated 9 months ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
MIGHTYEZ / Inversion-DPO
View on GitHub
☆19Jul 22, 2025Updated last year
davidserra9 / abair
View on GitHub
[CVIU'26] Adaptive Blind All-in-One Image Restoration
☆35Mar 17, 2025Updated last year
SLIT-AI / WRPO
View on GitHub
[ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion
☆14Mar 17, 2025Updated last year
zaydzuhri / flame
View on GitHub
Fork of Flame repo for training of some new stuff in development
☆20Jul 15, 2026Updated last week
hithqd / DynamicControl
View on GitHub
☆41Jan 10, 2025Updated last year
DAMO-NLP-SG / Inf-CLIP
View on GitHub
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…
☆287Jan 16, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MCG-NJU / FlowDCN
View on GitHub
[NeurIPS 2024] Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
☆37Dec 23, 2024Updated last year
Atotti / miipher-2
View on GitHub
Googleの音声復元モデルMiipher-2の再現実装の学習および推論コード。学習済みモデルも公開しています。
☆32Feb 7, 2026Updated 5 months ago
warner-benjamin / optimi
View on GitHub
Fast, Modern, and Low Precision PyTorch Optimizers
☆129May 16, 2026Updated 2 months ago
LucipherDev / ComfyUI-Golden-Noise
View on GitHub
ComfyUI Custom Node for "Golden Noise for Diffusion Models: A Learning Framework". This node refines the initial latent noise in the diff…
☆24Mar 28, 2025Updated last year
lmsdss / LayerNorm-Scaling
View on GitHub
[NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…
☆72Mar 3, 2026Updated 4 months ago
nanowell / AdEMAMix-Optimizer-Pytorch
View on GitHub
The AdEMAMix Optimizer: Better, Faster, Older.
☆188Sep 12, 2024Updated last year
GuoweiXu368 / OmniMocap-X
View on GitHub
Dataset for paper "OmniMotion-X: Versatile Multimodal Whole-Body Motion Generation"
☆23Dec 22, 2025Updated 7 months ago