giangdip2410 / HyperRouter

Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
31Updated 9 months ago

Related projects: