johncaged / OPT_QuestionerLinks

Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"

☆15

Alternatives and similar repositories for OPT_Questioner

Users that are interested in OPT_Questioner are comparing it to the libraries listed below

Sorting:

Rubics-Xuan / IVG
This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…
☆17Updated last year
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆141Updated 3 months ago
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆137Updated 3 months ago
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆60Updated 6 months ago
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆104Updated last year
DCDmllm / Momentor
☆80Updated last year
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 7 months ago
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆54Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆109Updated 6 months ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆55Updated 8 months ago
yellow-binary-tree / HawkEye
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆45Updated last year
Rubics-Xuan / MRES
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆72Updated last year
yeliudev / R2-Tuning
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
☆90Updated last year
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆91Updated 7 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆129Updated 4 months ago
CeeZh / SILVR
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Updated 3 months ago
zjr2000 / REVERIE
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Updated last year
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆143Updated 11 months ago
jpthu17 / DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
☆53Updated last year
SCZwangxiao / video-ReTaKe
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆38Updated 8 months ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
contrastive / FreeVideoLLM
☆83Updated last year
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆147Updated 5 months ago
OpenGVLab / TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
☆57Updated 8 months ago
rxtan2 / Koala-video-llm
☆36Updated last year
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆58Updated 6 months ago