cooper12121 / llama3-8x8b-MoE

Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
26Updated 6 months ago

Alternatives and similar repositories for llama3-8x8b-MoE:

Users that are interested in llama3-8x8b-MoE are comparing it to the libraries listed below