toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
☆31Sep 1, 2024Updated last year
Alternatives and similar repositories for lossfreebalance
Users that are interested in lossfreebalance are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation for POET and POET-X for LLM pretraining☆33Mar 12, 2026Updated 2 months ago
- ☆39Feb 26, 2024Updated 2 years ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- ☆26Jun 29, 2025Updated 11 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆115Dec 20, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [CVPR2024] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation☆19Sep 3, 2024Updated last year
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆14Jul 21, 2025Updated 10 months ago
- Spectral Sphere Optimizer☆118Mar 23, 2026Updated 2 months ago
- Automated neural architecture search algorithms implemented in PyTorch and Autogluon toolkit.☆12Apr 17, 2020Updated 6 years ago
- Code for the paper "Representing Spatial Trajectories as Distributions"☆13Jan 17, 2023Updated 3 years ago
- an official PyTorch implementation of the paper "Partial Network Cloning", CVPR 2023☆13Mar 21, 2023Updated 3 years ago
- [AAAI 2023] Official implementation of FiTs: Fine-grained Two-stage Training for Knowledge Base Question Answering☆11Mar 10, 2023Updated 3 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆105Apr 7, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- DCIC22数字中国22-牛只图像分割竞赛第四名方案☆14Jul 18, 2022Updated 3 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated 2 years ago
- 3D deformable convolution network(DCN) for head and neck tumor segmentation☆11May 4, 2023Updated 3 years ago
- Compact and Agent-Native MoE Training System☆144Updated this week
- Project repo for gpSLDS☆19Jan 12, 2026Updated 4 months ago
- Code for "Boosting Semi-supervised Image Segmentation with Global and Local Mutual Information Regularization"☆13Jul 14, 2021Updated 4 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆11Dec 30, 2024Updated last year
- i-mae Pytorch Repo☆20Apr 6, 2024Updated 2 years ago
- [ICML2022] "Identity-Disentangled Adversarial Augmentation for Self-Supervised Learning"☆10Jul 24, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 7 months ago
- Stanford Cars dataset by classes folder☆20Nov 7, 2024Updated last year
- [ICML 2026] Esoteric Language Models☆118May 1, 2026Updated last month
- ☆22Dec 23, 2024Updated last year
- ☆12Dec 30, 2020Updated 5 years ago
- Simple MoE - Day 17 of 365 Days of Repos☆19Updated this week
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆85Jan 12, 2025Updated last year
- This is the official PyTorch implementation of ASAG (ICCV 2023).☆18Sep 9, 2023Updated 2 years ago
- [cvpr2023] implementation of out-of-candidate rectification methods☆15Feb 28, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆11Feb 13, 2024Updated 2 years ago
- Code for "Learning Unitary Operators with Help From u(n)", AAAI-17. (https://arxiv.org/abs/1607.04903)☆17Jan 10, 2017Updated 9 years ago
- uncertainty-guided matting on ICML2023☆12Aug 3, 2023Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 3 months ago
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)☆17Nov 15, 2024Updated last year
- [ACM MM'23] Official implementation of paper "Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty".☆14Nov 22, 2023Updated 2 years ago