toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
☆31Sep 1, 2024Updated last year
Alternatives and similar repositories for lossfreebalance
Users that are interested in lossfreebalance are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆36Feb 26, 2024Updated 2 years ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- mobile DFF dataset☆13Nov 26, 2018Updated 7 years ago
- A free and open-source focus stacking software that supports multi-focus image alignment and fusion.☆25Feb 5, 2026Updated 2 months ago
- Code for the paper "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"☆12Oct 31, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ROS package for sending robots to a series of waypoints☆10Dec 10, 2021Updated 4 years ago
- [CVPR2024] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation☆19Sep 3, 2024Updated last year
- Code for the paper All-in-focus Imaging from Event Focal Stack, CVPR 2023.☆14Oct 3, 2025Updated 6 months ago
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆13Jul 21, 2025Updated 9 months ago
- Spectral Sphere Optimizer☆114Mar 23, 2026Updated last month
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated 2 years ago
- Code for the paper "Representing Spatial Trajectories as Distributions"☆13Jan 17, 2023Updated 3 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆98Apr 7, 2026Updated 3 weeks ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [AAAI 2023] Official implementation of FiTs: Fine-grained Two-stage Training for Knowledge Base Question Answering☆11Mar 10, 2023Updated 3 years ago
- Triton implement of bi-directional (non-causal) linear attention☆75Mar 1, 2026Updated last month
- DCIC22数字中国22-牛只图像分割竞赛第四名方案☆14Jul 18, 2022Updated 3 years ago
- 3D deformable convolution network(DCN) for head and neck tumor segmentation☆11May 4, 2023Updated 2 years ago
- ☆21Apr 14, 2025Updated last year
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- i-mae Pytorch Repo☆20Apr 6, 2024Updated 2 years ago
- [ICML2022] "Identity-Disentangled Adversarial Augmentation for Self-Supervised Learning"☆10Jul 24, 2022Updated 3 years ago
- (WACV'24) Kaizen: Practical self-supervised continual learning with continual fine-tuning☆16Oct 29, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 6 months ago
- Official TensorFlow implementation of "RECALL: Replay-based Continual Learning in Semantic Segmentation", ICCV 2021☆19Oct 7, 2021Updated 4 years ago
- ☆21Dec 23, 2024Updated last year
- Simple MoE - Day 17 of 365 Days of Repos☆18Apr 21, 2026Updated last week
- This is the official PyTorch implementation of ASAG (ICCV 2023).☆18Sep 9, 2023Updated 2 years ago
- [cvpr2023] implementation of out-of-candidate rectification methods☆15Feb 28, 2023Updated 3 years ago
- Exploring the minimal architecture required for coherent English language generation.☆13Updated this week
- Code for "Learning Unitary Operators with Help From u(n)", AAAI-17. (https://arxiv.org/abs/1607.04903)☆17Jan 10, 2017Updated 9 years ago
- [WACV 2024] BALF: Simple and Efficient Blur Aware Local Feature Detector☆27Mar 9, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 2 months ago
- AlignX-Family is an open-source research suite for advancing personalization in large language models-spanning data, code, models, and be…☆20Jan 12, 2026Updated 3 months ago
- [ACM MM'23] Official implementation of paper "Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty".☆14Nov 22, 2023Updated 2 years ago
- Lecture Notes for Learning the Julia language☆12Nov 18, 2019Updated 6 years ago
- Reinforcing Long-Term Performance in Recommender Systems with User-Oriented Exploration Policy (SIGIR 2024)☆14Oct 6, 2024Updated last year
- ☆10Dec 9, 2021Updated 4 years ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year