bluorion-com / ZClipLinks
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆128Updated 2 weeks ago
Alternatives and similar repositories for ZClip
Users that are interested in ZClip are comparing it to the libraries listed below
Sorting:
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆87Updated 3 weeks ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆91Updated last month
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆54Updated 4 months ago
- Implementation of a multimodal diffusion transformer in Pytorch☆102Updated last year
- Implementation of Infini-Transformer in Pytorch☆111Updated 6 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆110Updated 7 months ago
- Focused on fast experimentation and simplicity☆76Updated 6 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆246Updated 3 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated 8 months ago
- Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.☆114Updated last month
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆52Updated 5 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆287Updated last month
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆175Updated last year
- ☆222Updated last month
- open source alpha evolve☆66Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆115Updated last week
- Implementation of the proposed Spline-Based Transformer from Disney Research☆101Updated 8 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆185Updated 3 months ago
- ☆81Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 6 months ago
- Implementation of Agent Attention in Pytorch☆90Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆158Updated 3 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated 8 months ago
- ☆290Updated 2 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆92Updated 3 months ago
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.☆81Updated 2 months ago
- Esoteric Language Models☆87Updated 3 weeks ago
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 5 months ago
- H-Net: Hierarchical Network with Dynamic Chunking☆115Updated this week