stan-anony / derivative_free_lora_rank

☆16

Alternatives and similar repositories for derivative_free_lora_rank

Users that are interested in derivative_free_lora_rank are comparing it to the libraries listed below

Sorting:

JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆25Updated 5 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆33Updated 10 months ago
Infini-AI-Lab / S2FT
☆17Updated 4 months ago
imagination-research / LCSC
[ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
☆14Updated 3 months ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated 10 months ago
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last year
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆21Updated 4 months ago
thunlp / Modularity-Analysis
Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"
☆23Updated last year
duykhuongnguyen / LASeR-MAB
Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"
☆13Updated 7 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆51Updated 2 years ago
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 3 weeks ago
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆21Updated 6 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆72Updated last month
shoaibahmed / llm_depth_pruning
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
☆15Updated 9 months ago
uservan / ThinkPO
☆20Updated 2 months ago
sail-sg / ActivePRM
☆15Updated last month
yikangshen / MoA
Mixture of Attention Heads
☆44Updated 2 years ago
yale-nlp / refdpo
☆16Updated 9 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆51Updated 3 months ago
andyjm3 / SLTrain
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆31Updated 6 months ago
zju-vipa / training_free_model_merging
This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).
☆29Updated last year
VITA-Group / LoCoCo
[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
☆16Updated 8 months ago
wjxts / RegularizedBN
☆21Updated 2 years ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆76Updated last year
TencentARC / pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆32Updated last year
cliang1453 / task-aware-distillation
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆34Updated last year
shizhediao / Black-Box-Prompt-Learning
Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"
☆55Updated last year