toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
☆31Sep 1, 2024Updated last year
Alternatives and similar repositories for lossfreebalance
Users that are interested in lossfreebalance are comparing it to the libraries listed below
Sorting:
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- [KDD 2025] MM-Path: Multi-modal, Multi-granularity Path Representation Learning.☆16Jan 9, 2025Updated last year
- [AAAI 2025] Holistic Semantic Representation for Navigational Trajectory Generation☆18Mar 2, 2026Updated last week
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last month
- ☆36Feb 26, 2024Updated 2 years ago
- ☆12Dec 20, 2018Updated 7 years ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆105Dec 20, 2024Updated last year
- [CVPR2024] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation☆19Sep 3, 2024Updated last year
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated last year
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- ☆14Jan 23, 2026Updated last month
- Vietnamese diacritics restoration☆14Jan 18, 2016Updated 10 years ago
- Experiments on Multi-Head Latent Attention☆100Aug 19, 2024Updated last year
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- Code for "Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning" (EMNLP 2022) and "Empowering Parameter-Efficient Transfer Learning…☆11Feb 6, 2023Updated 3 years ago
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)☆14Nov 4, 2025Updated 4 months ago
- [AAAI 2023] Official implementation of FiTs: Fine-grained Two-stage Training for Knowledge Base Question Answering☆11Mar 10, 2023Updated 3 years ago
- Code for running forward and backward versions of GPT2☆10Nov 20, 2021Updated 4 years ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆13Jul 21, 2025Updated 7 months ago
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year
- Pytorch routines for (Ker)nel (Mac)hines☆11Oct 10, 2025Updated 5 months ago
- Quantization of LLMs and benchmarking.☆10Apr 3, 2024Updated last year
- ☆20Jul 23, 2025Updated 7 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- 苏州大学研究生学位论文模板 - Soochow University Thesis TeX Template☆18Feb 27, 2026Updated last week
- ☆12Jun 15, 2023Updated 2 years ago
- ☆12Dec 30, 2020Updated 5 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- This tutorial aims to improve the Fedora Linux experience for everyday use.☆15Aug 13, 2024Updated last year
- Tutorials for MATH 4432 Statistical Machine Learning, HKUST, Fall 2022☆11Sep 17, 2024Updated last year
- This is a sample project where we can get the exact use case of pythons multi threading.☆11Oct 6, 2020Updated 5 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Training a BERT model from scratch.☆11Oct 15, 2023Updated 2 years ago
- This is a repository for RM2021 Software tutorial☆11Nov 4, 2020Updated 5 years ago
- NOMU: Neural Optimization-based Model Uncertainty☆10Feb 17, 2023Updated 3 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 2 weeks ago
- An Enterprise LLM chat system using LibreChat, AWS Bedrock and LDAP/AD Authentication☆15Updated this week
- Code for the paper "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"☆12Oct 31, 2024Updated last year