kpup1710 / CAMExLinks
[ICLR 2025] CAMEx: Curvature-Aware Merging of Experts
☆22Updated 6 months ago
Alternatives and similar repositories for CAMEx
Users that are interested in CAMEx are comparing it to the libraries listed below
Sorting:
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆41Updated 3 months ago
- ☆22Updated last year
- ☆73Updated 7 months ago
- One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.☆24Updated 3 months ago
- A More Fair and Comprehensive Comparison between KAN and MLP☆174Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆56Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆132Updated last week
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆42Updated 11 months ago
- ☆35Updated 6 months ago
- Unofficial Implementation of Selective Attention Transformer☆17Updated 10 months ago
- ☆183Updated 11 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆128Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated last year
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- Implementation of Infini-Transformer in Pytorch☆111Updated 8 months ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆26Updated 4 months ago
- A curated list of Model Merging methods.☆92Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆83Updated 10 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 4 months ago
- The official repo of continuous speculative decoding☆29Updated 5 months ago
- Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]☆89Updated 2 years ago
- ☆148Updated last year
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆31Updated last year
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆21Updated 2 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆39Updated 5 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- ☆30Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆102Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆31Updated last year