kpup1710 / CAMExLinks

[ICLR 2025] CAMEx: Curvature-Aware Merging of Experts

☆22

Alternatives and similar repositories for CAMEx

Users that are interested in CAMEx are comparing it to the libraries listed below

Sorting:

Fsoft-AIC / LibMoE
LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS
☆41Updated 3 months ago
HuyNguyen-hust / flash-attn-101
☆22Updated last year
Adamdad / rational_kat_cu
☆73Updated 7 months ago
jaisidhsingh / pytorch-mixtures
One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.
☆24Updated 3 months ago
yu-rp / KANbeFair
A More Fair and Comprehensive Comparison between KAN and MLP
☆174Updated last year
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆56Updated last year
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆132Updated last week
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆42Updated 11 months ago
TianjinYellow / SPAM-Optimizer
☆35Updated 6 months ago
fangyuan-ksgk / selective-attention-transformer
Unofficial Implementation of Selective Attention Transformer
☆17Updated 10 months ago
OscarXZQ / weight-selection
☆183Updated 11 months ago
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆128Updated last year
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆111Updated 8 months ago
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆26Updated 4 months ago
ycjing / Awesome-Model-Merging
A curated list of Model Merging methods.
☆92Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆83Updated 10 months ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 4 months ago
MarkXCloud / CSpD
The official repo of continuous speculative decoding
☆29Updated 5 months ago
AvivNavon / DWSNets
Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]
☆89Updated 2 years ago
Chaos96 / fourierft
☆148Updated last year
pilancilab / Riemannian_Preconditioned_LoRA
source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"
☆31Updated last year
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆21Updated 2 months ago
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆39Updated 5 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
Qualcomm-AI-research / llm-surgeon
☆30Updated last year
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆102Updated last year
uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆31Updated last year