ambisinister / lossfreebalanceView external linksLinks
toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
☆30Sep 1, 2024Updated last year
Alternatives and similar repositories for lossfreebalance
Users that are interested in lossfreebalance are comparing it to the libraries listed below
Sorting:
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- [AAAI 2025] Holistic Semantic Representation for Navigational Trajectory Generation☆17Sep 12, 2025Updated 5 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆106Dec 20, 2024Updated last year
- ☆14Jan 23, 2026Updated 3 weeks ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆15Sep 7, 2024Updated last year
- [CVPR2024] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation☆19Sep 3, 2024Updated last year
- This is code for the EMNLP 2022 Paper "UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation".☆10Apr 30, 2023Updated 2 years ago
- OLD Codebase for Intelligent Systems 2020 and Project AI, Vrije Universiteit Amsterdam☆12Jan 10, 2023Updated 3 years ago
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated 11 months ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- Experiments on Multi-Head Latent Attention☆99Aug 19, 2024Updated last year
- Spectral Sphere Optimizer☆96Jan 14, 2026Updated last month
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆89Jan 29, 2026Updated 2 weeks ago
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year
- ☆13May 30, 2022Updated 3 years ago
- Training a BERT model from scratch.☆11Oct 15, 2023Updated 2 years ago
- [ICML 2024] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"☆10Jul 1, 2024Updated last year
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Feb 11, 2022Updated 4 years ago
- An Enterprise LLM chat system using LibreChat, AWS Bedrock and LDAP/AD Authentication☆13Nov 25, 2025Updated 2 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Math evaluations of llama models.☆10Jan 3, 2024Updated 2 years ago
- [AAAI 2023] Official implementation of FiTs: Fine-grained Two-stage Training for Knowledge Base Question Answering☆11Mar 10, 2023Updated 2 years ago
- Code for "Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning" (EMNLP 2022) and "Empowering Parameter-Efficient Transfer Learning…☆11Feb 6, 2023Updated 3 years ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Code for running forward and backward versions of GPT2☆10Nov 20, 2021Updated 4 years ago
- Quantization of LLMs and benchmarking.☆10Apr 3, 2024Updated last year
- NOMU: Neural Optimization-based Model Uncertainty☆10Feb 17, 2023Updated 3 years ago
- ☆11Jan 21, 2024Updated 2 years ago
- This is a sample project where we can get the exact use case of pythons multi threading.☆11Oct 6, 2020Updated 5 years ago
- ☆12Dec 30, 2020Updated 5 years ago
- Simple MoE - Day 17 of 365 Days of Repos☆16Jan 17, 2025Updated last year
- ☆11Dec 15, 2025Updated 2 months ago
- [KDD 2025 D&B] CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks.☆47Jul 15, 2025Updated 7 months ago
- Mamba SSM architecture that supports training on variable-length sequences☆12Sep 1, 2025Updated 5 months ago
- (ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.☆11Jan 28, 2024Updated 2 years ago
- [ICML2022] "Identity-Disentangled Adversarial Augmentation for Self-Supervised Learning"☆10Jul 24, 2022Updated 3 years ago
- Artifacts for SoK: Can Trajectory Generation Combine Privacy and Utility?☆15Jun 27, 2024Updated last year