kpup1710 / CAMExLinks
[ICLR 2025] CAMEx: Curvature-Aware Merging of Experts
☆22Updated 9 months ago
Alternatives and similar repositories for CAMEx
Users that are interested in CAMEx are comparing it to the libraries listed below
Sorting:
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆45Updated last week
- One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.☆25Updated 6 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆57Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆79Updated 2 years ago
- ☆187Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated last month
- ☆34Updated 10 months ago
- ☆36Updated 8 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆33Updated last year
- ☆151Updated last year
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆47Updated 9 months ago
- ☆35Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆45Updated last month
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆134Updated last month
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆25Updated last week
- ☆76Updated 10 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆75Updated last year
- Experiments on Multi-Head Latent Attention☆99Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 7 months ago
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆51Updated 5 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Updated 2 years ago
- Implementation of Infini-Transformer in Pytorch☆113Updated 11 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Updated last year
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Updated last year
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆231Updated last month
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆38Updated last year
- ☆21Updated 2 years ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆137Updated 8 months ago