cooper12121 / llama3-8x8b-MoE

Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
25Updated 4 months ago

Related projects

Alternatives and complementary repositories for llama3-8x8b-MoE