cooper12121 / llama3-8x8b-MoE
Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
☆25Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for llama3-8x8b-MoE
- ☆34Updated 2 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 6 months ago
- FuseAI Project☆76Updated 2 months ago
- ☆37Updated 4 months ago
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 6 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆52Updated 6 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆37Updated 4 months ago
- Reformatted Alignment☆112Updated last month
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Unofficial implementation of AlpaGasus☆84Updated last year
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆43Updated 7 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆25Updated 5 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆123Updated 2 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆96Updated last week
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆16Updated 2 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆116Updated 4 months ago
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆35Updated this week
- Fantastic Data Engineering for Large Language Models☆49Updated 3 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆69Updated 2 weeks ago
- ☆89Updated last month
- An Experiment on Dynamic NTK Scaling RoPE☆61Updated 11 months ago
- ☆33Updated 6 months ago
- ☆62Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 7 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆76Updated 10 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆60Updated last year
- ☆77Updated last month
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆27Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆43Updated last week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month