MikaStars39 / StableMaskLinks

PyTorch implementation of StableMask (ICML'24)

☆13

Alternatives and similar repositories for StableMask

Users that are interested in StableMask are comparing it to the libraries listed below

Sorting:

DRSY / KV_Compression
[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens
☆25Updated last year
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆38Updated 9 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆47Updated 8 months ago
waltonfuture / MM-UPT
Unsupervised GRPO
☆41Updated last month
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆86Updated 8 months ago
Infini-AI-Lab / S2FT
☆19Updated 7 months ago
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆72Updated 8 months ago
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆25Updated last week
OpenSparseLLMs / MoM
☆95Updated 3 months ago
mathllm / Step-Controlled_DPO
☆22Updated last year
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆42Updated 9 months ago
mlfoundations / VisIT-Bench
☆50Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆46Updated 5 months ago
tongxuluo / prts
Code and Model for NeurIPS 2024 Spotlight Paper "Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training…
☆42Updated 9 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
Kevinz-code / SeVa
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
☆56Updated last year
mrflogs / LoRA-Pro
Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "
☆127Updated 3 months ago
OpenSparseLLMs / Open-Pandora
Open-Pandora: On-the-fly Control Video Generation
☆34Updated 8 months ago
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆95Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
locuslab / llava-token-compression
☆43Updated 8 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆38Updated 5 months ago
kokolerk / TON
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆40Updated 3 weeks ago
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆86Updated 8 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆68Updated 2 months ago
MileBench / MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
☆36Updated last year
RUCBM / DeepCritic
Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"
☆32Updated last month
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 6 months ago
UW-Madison-Lee-Lab / VersaPRM
☆22Updated 5 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆43Updated 3 weeks ago