OscarXZQ / weight-selection
☆179Updated 5 months ago
Alternatives and similar repositories for weight-selection:
Users that are interested in weight-selection are comparing it to the libraries listed below
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated 6 months ago
- Official code for "TOAST: Transfer Learning via Attention Steering"☆189Updated last year
- Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”☆249Updated last year
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆174Updated 2 months ago
- ☆50Updated last year
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.☆101Updated 2 months ago
- A simple minimal implementation of Reversible Vision Transformers☆122Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆85Updated last week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 5 months ago
- ☆101Updated last year
- Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"☆104Updated last year
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆52Updated 6 months ago
- A repository for DenseSSMs☆87Updated 11 months ago
- PB-LLM: Partially Binarized Large Language Models☆151Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆99Updated 9 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- Official PyTorch implementation of A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR 2022)☆153Updated 2 years ago
- Matryoshka Multimodal Models☆98Updated last month
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated 10 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆205Updated 2 weeks ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆99Updated last year
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆50Updated last month
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆269Updated 10 months ago
- Model Stock: All we need is just a few fine-tuned models☆106Updated 5 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆214Updated 3 weeks ago
- [CVPR 2025] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive…☆235Updated 2 months ago
- ☆42Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆150Updated 3 months ago
- Language Quantized AutoEncoders☆102Updated 2 years ago