toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
☆31Sep 1, 2024Updated last year
Alternatives and similar repositories for lossfreebalance
Users that are interested in lossfreebalance are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- Implementation for POET and POET-X for LLM pretraining☆37Jun 9, 2026Updated 2 weeks ago
- ☆39Feb 26, 2024Updated 2 years ago
- [KDD 2025] MM-Path: Multi-modal, Multi-granularity Path Representation Learning.☆16Jan 9, 2025Updated last year
- [AAAI 2025] Holistic Semantic Representation for Navigational Trajectory Generation☆19Mar 7, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the paper "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"☆11Oct 31, 2024Updated last year
- [CVPR2024] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation☆19Sep 3, 2024Updated last year
- official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…☆20Jul 27, 2025Updated 11 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆30Feb 6, 2026Updated 4 months ago
- Automated neural architecture search algorithms implemented in PyTorch and Autogluon toolkit.☆12Apr 17, 2020Updated 6 years ago
- Code for the paper "Representing Spatial Trajectories as Distributions"☆13Jan 17, 2023Updated 3 years ago
- Cross Visual Prompt Tuning [ICCV 2025]☆13Aug 3, 2025Updated 10 months ago
- an official PyTorch implementation of the paper "Partial Network Cloning", CVPR 2023☆13Mar 21, 2023Updated 3 years ago
- ☆35Mar 17, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Triton implement of bi-directional (non-causal) linear attention☆76Mar 1, 2026Updated 3 months ago
- ☆15Mar 30, 2025Updated last year
- MIPS 57条指令五级流水线cpu (verilog实现+详细注释)☆11Jan 11, 2022Updated 4 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated 2 years ago
- seminar for undergraduates☆16Jun 8, 2021Updated 5 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆11Dec 30, 2024Updated last year
- Universal memory runtime for AI agents☆49Jun 18, 2026Updated last week
- (WACV'24) Kaizen: Practical self-supervised continual learning with continual fine-tuning☆17Oct 29, 2024Updated last year
- Vietnamese diacritics restoration☆13Jan 18, 2016Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 8 months ago
- Stanford Cars dataset by classes folder☆21Nov 7, 2024Updated last year
- Experiments on Multi-Head Latent Attention☆101Aug 19, 2024Updated last year
- ☆22Dec 23, 2024Updated last year
- ☆12Dec 30, 2020Updated 5 years ago
- Simple MoE - Day 17 of 365 Days of Repos☆20Jun 2, 2026Updated 3 weeks ago
- [CVPR' 26] MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts☆45Apr 27, 2026Updated 2 months ago
- [cvpr2023] implementation of out-of-candidate rectification methods☆15Feb 28, 2023Updated 3 years ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Exploring the minimal architecture required for coherent English language generation.☆13Jun 11, 2026Updated 2 weeks ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆11Feb 13, 2024Updated 2 years ago
- [WACV 2024] BALF: Simple and Efficient Blur Aware Local Feature Detector☆29Mar 9, 2026Updated 3 months ago
- uncertainty-guided matting on ICML2023☆12Aug 3, 2023Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 4 months ago
- AlignX-Family is an open-source research suite for advancing personalization in large language models-spanning data, code, models, and be…☆20Jan 12, 2026Updated 5 months ago
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)☆17Nov 15, 2024Updated last year