kpup1710 / CAMExLinks
[ICLR 2025] CAMEx: Curvature-Aware Merging of Experts
☆20Updated 3 months ago
Alternatives and similar repositories for CAMEx
Users that are interested in CAMEx are comparing it to the libraries listed below
Sorting:
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆39Updated last month
- ☆21Updated 9 months ago
- From Implicit to Explicit Feedback: A deep neural network for modeling sequential behaviours and long-short term preferences of online us…☆1Updated last year
- RecGPT: Generative Pre-training for Text-based Recommendation (ACL 2024)☆33Updated 8 months ago
- This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"☆27Updated 2 years ago
- [CVPR 2025] h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform☆48Updated 2 months ago
- ☆61Updated 4 months ago
- [ICLR 2024] Official implementation of Bellman Optimal Stepsize Straightening of Flow-Matching Models☆35Updated last year
- ☆16Updated last year
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated last month
- Pioneering in Vietnamese Multimodal Large Language Model☆47Updated 4 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 7 months ago
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆16Updated last week
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆30Updated last year
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆67Updated this week
- ☆8Updated 3 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆73Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆31Updated 7 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆31Updated last year
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆15Updated 9 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆54Updated 11 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆23Updated 11 months ago
- The repository of paper Personalized Multimodal Response Generation with Large Language Models☆14Updated 11 months ago
- [IJCAI'23] The official Github page of the paper "Diffusion Models for Non-autoregressive Text Generation: A Survey".☆31Updated last year
- [WSDM 2024] Official PyTorch Implementation of Linear Recurrent Units for Sequential Recommendation (LRURec)☆58Updated 3 months ago
- Official implementation of Learning to Discretize Denoising Diffusion ODEs☆22Updated 2 weeks ago
- Switch EMA: A Free Lunch for Better Flatness and Sharpness☆26Updated last year
- ☆21Updated 2 years ago
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆26Updated 11 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 7 months ago