swordlidev / LLaVA-MRLinks

LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval

☆8

Alternatives and similar repositories for LLaVA-MR

Users that are interested in LLaVA-MR are comparing it to the libraries listed below

Sorting:

dengandong / GroundMoRe
☆13Updated 4 months ago
WeitaiKang / SegVG
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
☆60Updated 9 months ago
Qinying-Liu / TagAlign
Official implementation of TagAlign
☆35Updated 8 months ago
ltttpku / CMMP
☆20Updated 9 months ago
bimsarapathiraja / MCCL
MCCL: Multiclass Confidence and Localization Calibration for Object Detection
☆9Updated last year
cv516Buaa / OV-VG
☆32Updated last year
PKU-ICST-MIPL / DyFo_CVPR2025
☆78Updated 2 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆57Updated 9 months ago
lorebianchi98 / FG-CLIP
[CBMI2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".
☆28Updated 3 months ago
yunncheng / MMRL
[CVPR 2025] Official PyTorch Code for "MMRL: Multi-Modal Representation Learning for Vision-Language Models" and its extension "MMRL++: P…
☆62Updated last month
lntzm / MESM
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
☆30Updated last year
contrastive / FreeVideoLLM
☆81Updated 9 months ago
yingsen1 / UniMD
UniMD: Towards Unifying Moment retrieval and temporal action Detection
☆51Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆63Updated last year
SliMM-X / CoMP-MM
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆30Updated 4 months ago
iSEE-Laboratory / HD-OVD
(TMM 2025) Official repository of paper "A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection"
☆19Updated 4 months ago
GeWu-Lab / Ref-AVS
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
☆45Updated 8 months ago
linhuixiao / OneRef
[NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.
☆22Updated this week
mlvlab / VidChain
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…
☆21Updated 6 months ago
Rubics-Xuan / IVG
This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…
☆17Updated last year
cilinyan / ReVOS-api
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆18Updated last year
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆48Updated last month
ludc506 / InternVL-X
☆15Updated 4 months ago
wusize / CLIM
[AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation
☆29Updated last year
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆23Updated 7 months ago
Jiaxing-star / LLaVA-Octopus
☆11Updated 7 months ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated 9 months ago
ruc-aimc-lab / TeachCLIP
[CVPR 2024] TeachCLIP for Text-to-Video Retrieval
☆35Updated 3 months ago
GasolSun36 / MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆22Updated 11 months ago
bladewaltz1 / PromptSwitch
☆30Updated last year