chenin-wang / awesome_ai_paperLinks

☆21

Alternatives and similar repositories for awesome_ai_paper

Users that are interested in awesome_ai_paper are comparing it to the libraries listed below

Sorting:

kesenzhao / UV-CoT
☆33Updated last month
Hui-design / Open-LLaVA-Video-R1
[LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)
☆32Updated 4 months ago
ZhangXJ199 / TinyLLaVA-Video-R1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆104Updated 4 months ago
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆66Updated last week
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆153Updated 6 months ago
ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆92Updated last month
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆192Updated 2 months ago
yuanpinz / awesome-deep-multimodal-reasoning
Collect the awesome works evolved around reasoning models like O1/R1 in visual domain
☆41Updated 2 months ago
yu-rp / VisualPerceptionToken
☆122Updated 6 months ago
Hui-design / TSPO
[✨Official Code of TSPO] Temporal Sampling Policy Optimization for Long-form Video Language Understanding
☆44Updated 3 weeks ago
Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆90Updated 2 months ago
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆265Updated 5 months ago
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆70Updated last month
PKU-ICST-MIPL / Finedefics_ICLR2025
☆71Updated 5 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆198Updated 5 months ago
taishan1994 / llava-handbook
对llava官方代码的一些学习笔记
☆29Updated 11 months ago
1zhou-Wang / MemVR
[ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…
☆153Updated 3 weeks ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆181Updated 4 months ago
jungao1106 / ICoT
[CVPR' 25] Interleaved-Modal Chain-of-Thought
☆86Updated last month
threegold116 / Awesome-Omni-MLLMs
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆60Updated 2 weeks ago
MAC-AutoML / QuoTA
This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…
☆73Updated 4 months ago
Video-R1 / Awesome-Multimodal-Reasoning
Collections of Papers and Projects for Multimodal Reasoning.
☆105Updated 5 months ago
ding523 / Curr_REFT
☆72Updated 4 months ago
ggg0919 / cantor
☆90Updated last year
Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
☆68Updated 6 months ago
xmed-lab / TAM
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆78Updated last month
ustc-hyin / ClearSight
Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
☆31Updated 9 months ago
ADaM-BJTU / Mind_with_eyes_Awesome_MLLMs_Reasoning
This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!
☆51Updated 6 months ago
contrastive / FreeVideoLLM
☆81Updated 10 months ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
A Survey on Benchmarks of Multimodal Large Language Models
☆140Updated 2 months ago