SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆131Updated 10 months ago
Alternatives and similar repositories for Skywork-MoE:
Users that are interested in Skywork-MoE are comparing it to the libraries listed below
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆160Updated this week
- Repository of LV-Eval Benchmark☆63Updated 7 months ago
- Reformatted Alignment☆115Updated 6 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- ☆142Updated last month
- ☆46Updated 10 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆171Updated last month
- ☆63Updated 4 months ago
- Efficient Mixture of Experts for LLM Paper List☆62Updated 4 months ago
- ☆73Updated 2 weeks ago
- ☆29Updated 7 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆249Updated 4 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆239Updated 5 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- FuseAI Project☆85Updated 2 months ago
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆57Updated 5 months ago
- A Comprehensive Survey on Long Context Language Modeling☆131Updated 3 weeks ago
- Mixture-of-Experts (MoE) Language Model☆186Updated 7 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆112Updated last week
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆236Updated this week
- ☆94Updated 4 months ago
- ☆178Updated last week
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆54Updated 6 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆62Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆134Updated 9 months ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆55Updated 2 weeks ago
- ☆314Updated 7 months ago
- ☆81Updated last year