cooper12121 / llama3-8x8b-MoE
Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
☆26Updated 6 months ago
Alternatives and similar repositories for llama3-8x8b-MoE:
Users that are interested in llama3-8x8b-MoE are comparing it to the libraries listed below
- ☆45Updated 7 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆120Updated 2 weeks ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- ☆36Updated 4 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆138Updated 4 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆47Updated 2 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆58Updated 3 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆46Updated 2 weeks ago
- ☆27Updated 5 months ago
- ☆35Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated 10 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆55Updated 9 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆53Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆48Updated 3 months ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆163Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆128Updated 7 months ago
- Reformatted Alignment☆113Updated 4 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆96Updated 3 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆209Updated 3 months ago
- Automatic prompt optimization framework for multi-step agent tasks.☆27Updated 2 months ago
- FuseAI Project☆80Updated this week
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆64Updated last year
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆43Updated 7 months ago
- ☆48Updated 10 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆259Updated this week
- ☆17Updated last year
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆100Updated 3 months ago
- ☆88Updated last month
- ☆94Updated 4 months ago