cooper12121 / llama3-8x8b-MoE
Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
☆25Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for llama3-8x8b-MoE
- ☆40Updated 5 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆49Updated last year
- ☆35Updated 2 months ago
- FuseAI Project☆76Updated 3 months ago
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆36Updated 2 weeks ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆33Updated this week
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆52Updated 7 months ago
- ☆48Updated 8 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆28Updated 5 months ago
- Unofficial implementation of AlpaGasus☆84Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆125Updated 2 months ago
- Fantastic Data Engineering for Large Language Models☆50Updated 3 months ago
- ☆88Updated last month
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆124Updated 4 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆38Updated 4 months ago
- Automatic prompt optimization framework for multi-step agent tasks.☆21Updated last week
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆43Updated 7 months ago
- Reformatted Alignment☆112Updated last month
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆77Updated 10 months ago
- The official repository of the Omni-MATH benchmark.☆49Updated 2 weeks ago
- An Experiment on Dynamic NTK Scaling RoPE☆61Updated 11 months ago
- ☆78Updated 2 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆27Updated 3 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆99Updated this week
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 7 months ago
- ☆129Updated 4 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- ☆22Updated 3 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 7 months ago