QwenLM / ParScaleLinks
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆443Updated 4 months ago
Alternatives and similar repositories for ParScale
Users that are interested in ParScale are comparing it to the libraries listed below
Sorting:
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆191Updated last week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆220Updated 3 weeks ago
- ☆816Updated 4 months ago
- ☆203Updated 5 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.