tokenbender/mHC-manifold-constrained-hyper-connections

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tokenbender/mHC-manifold-constrained-hyper-connections)

tokenbender / mHC-manifold-constrained-hyper-connections

implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880

☆367

Alternatives and similar repositories for mHC-manifold-constrained-hyper-connections

Users that are interested in mHC-manifold-constrained-hyper-connections are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / hyper-connections
View on GitHub
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆186May 13, 2026Updated 2 months ago
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆257Jun 15, 2025Updated last year
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
fla-org / fla-zoo
View on GitHub
Flash-Linear-Attention models beyond language
☆21Aug 28, 2025Updated 10 months ago
zhixuan-lin / forgetting-transformer
View on GitHub
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆150Feb 25, 2026Updated 4 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
Unakar / Spectral-Sphere-Optimizer
View on GitHub
Spectral Sphere Optimizer
☆130Mar 23, 2026Updated 3 months ago
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
fla-org / hybrid-distillation
View on GitHub
☆33Dec 31, 2025Updated 6 months ago
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
wdlctc / open-attention-residuals
View on GitHub
Open implementation of Attention Residuals (Kimi Team, arXiv:2603.15031)
☆72Apr 30, 2026Updated 2 months ago
leloykun / adaptive-muon
View on GitHub
A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…
☆19Jan 11, 2025Updated last year
test-time-training / ttt-tk
View on GitHub
☆45Nov 1, 2025Updated 8 months ago
chr26195 / AP-MDM
View on GitHub
This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".
☆23Nov 17, 2025Updated 7 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
zhehangdu / Newton-Muon
View on GitHub
The Newton-Muon optimizer
☆30Jun 5, 2026Updated last month
ApexGen-X / MergeVQ
View on GitHub
[CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization
☆51Jul 22, 2025Updated 11 months ago
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆63Jul 7, 2026Updated last week
JoeLi12345 / nGPT
View on GitHub
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆112Mar 7, 2025Updated last year
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
catswe / flash-attention-residuals
View on GitHub
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
☆84May 29, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
LINs-lab / GMem
View on GitHub
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆43Mar 11, 2025Updated last year
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,500Aug 3, 2025Updated 11 months ago
zhangyitonggg / dllm4code
View on GitHub
Offical implementation of our paper "Exploring the Potential of Diffusion Large Language Models in Code Generation".
☆23Oct 29, 2025Updated 8 months ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
mit-han-lab / flash-moba
View on GitHub
☆250Nov 19, 2025Updated 7 months ago
AwesomeSeq / Comba-triton
View on GitHub
☆47Jun 16, 2025Updated last year
KellerJordan / Muon
View on GitHub
Muon is an optimizer for hidden layers in neural networks
☆2,699May 24, 2026Updated last month
sail-sg / Stable-RL
View on GitHub
Rethinking the Trust Region in LLM Reinforcement Learning
☆61Mar 2, 2026Updated 4 months ago
tinnerhrhe / GARDO
View on GitHub
Official codes for the paper "GARDO: Reinforcing Diffusion Models without Reward Hacking"
☆60May 3, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lucidrains / fast-weight-attention
View on GitHub
Implementation of Fast Weight Attention
☆32Jun 3, 2026Updated last month
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated last month
FFTYYY / mhc-lite
View on GitHub
mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations
☆91Jan 12, 2026Updated 6 months ago
NX-AI / mlstm_kernels
View on GitHub
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆90Jul 6, 2026Updated last week
ByteDance-Seed / AHN
View on GitHub
AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
☆181Oct 17, 2025Updated 8 months ago
LIONS-EPFL / scion
View on GitHub
☆69Apr 8, 2026Updated 3 months ago