stan-anony / derivative_free_lora_rank
☆15Updated 11 months ago
Alternatives and similar repositories for derivative_free_lora_rank:
Users that are interested in derivative_free_lora_rank are comparing it to the libraries listed below
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆41Updated 2 months ago
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 7 months ago
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 3 months ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆16Updated 4 months ago
- ☆27Updated last year
- Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆14Updated 9 months ago
- ☆38Updated 11 months ago
- Code for T-MARS data filtering☆35Updated last year
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆27Updated 10 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆19Updated last month
- A Closer Look into Mixture-of-Experts in Large Language Models☆41Updated 5 months ago
- ☆18Updated 6 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆28Updated 7 months ago
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆34Updated last month
- ☆16Updated 6 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆31Updated 8 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆16Updated last month
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆27Updated 2 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated last year
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View☆46Updated 3 months ago
- ☆19Updated last year
- PyTorch implementation of StableMask (ICML'24)☆12Updated 7 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆27Updated 10 months ago
- ☆15Updated 6 months ago
- ☆33Updated last year
- Long Context Extension and Generalization in LLMs☆40Updated 4 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆15Updated 5 months ago