lcqysl / FrameThinker-RLLinks

[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"

☆34

Alternatives and similar repositories for FrameThinker-RL

Users that are interested in FrameThinker-RL are comparing it to the libraries listed below

Sorting:

yale-nlp / TOMATO
☆37Updated last year
IVUL-KAUST / VideoAuto-R1
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆61Updated last month
takomc / amp
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆21Updated last year
Share14 / ShareGemini
☆32Updated last year
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆113Updated last week
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆54Updated 4 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆77Updated 6 months ago
zjr2000 / REVERIE
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Updated last year
CASIA-IVA-Lab / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆54Updated 11 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆66Updated 8 months ago
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆66Updated 8 months ago
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆60Updated 8 months ago
real-absolute-AI / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆104Updated 4 months ago
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆49Updated last year
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆52Updated last year
MCG-NJU / p-MoD
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆43Updated 7 months ago
si0wang / VisVM
☆46Updated last year
princetonvisualai / icons
☆23Updated 9 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆84Updated 3 months ago
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆32Updated 10 months ago
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 6 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆70Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆68Updated last year
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆55Updated 10 months ago
KD-TAO / OmniZip
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
☆51Updated last week
lscpku / VITATECS
☆18Updated last year
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆47Updated last year
foundation-multimodal-models / ConBench
[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
☆38Updated last year
SCZwangxiao / video-ReTaKe
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆39Updated 10 months ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆42Updated 3 months ago