yikangshen / MoALinks

Mixture of Attention Heads

☆51

Alternatives and similar repositories for MoA

Users that are interested in MoA are comparing it to the libraries listed below

Sorting:

BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆33Updated 2 years ago
SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Updated 2 years ago
TencentARC / pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Updated 2 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
twinkle0331 / LGTM
[ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…
☆38Updated 2 years ago
HenryHZY / VL-PET
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆52Updated 2 years ago
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆39Updated last year
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆99Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
xiangyu-mm / EasyGen
The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"
☆74Updated last year
BryceZhuo / HybridNorm
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆17Updated 8 months ago
kugwzk / DiDE
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆31Updated 2 years ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆40Updated last year
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Updated 2 years ago
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
TobiasLee / VEC
Visual and Embodied Concepts evaluation benchmark
☆21Updated 2 years ago
Victorwz / VaLM
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Updated 2 years ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆85Updated last year
Pbihao / SLM
☆28Updated last year
fuzihaofzh / AnalyzeParameterEfficientFinetune
On the Effectiveness of Parameter-Efficient Fine-Tuning
☆38Updated 2 years ago
berlino / gated_linear_attention
☆106Updated last year
lancopku / DynamicKD
Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"
☆41Updated 3 years ago
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
r-three / smear
☆30Updated 2 years ago
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆33Updated last year
adymaharana / d2pruning
☆43Updated 2 years ago
mshukor / eP-ALM
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Updated 2 years ago
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆63Updated 2 years ago
MrZilinXiao / ProxyThinker
Official Implementation of ProxyThinker: Test-Time Guidance through Small Visual Reasoners.
☆16Updated 2 months ago