SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆128Updated 9 months ago
Alternatives and similar repositories for Skywork-MoE:
Users that are interested in Skywork-MoE are comparing it to the libraries listed below
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 2 months ago
- Mixture-of-Experts (MoE) Language Model☆185Updated 6 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆54Updated 3 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆241Updated 2 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆226Updated 3 weeks ago
- Reformatted Alignment☆114Updated 5 months ago
- FuseAI Project☆83Updated last month
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆53Updated 5 months ago
- ☆96Updated 11 months ago
- ☆101Updated 3 months ago
- ☆28Updated 6 months ago
- ☆44Updated 8 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆127Updated 7 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆61Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 10 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆164Updated 3 weeks ago
- ☆59Updated 3 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 9 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year