ChocoWu / Awesome-Scene-Graph-for-CrossModal-Learning

This is a repository for listing papers on scene graph generation and application.

☆20

Related projects: ⓘ

HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆38Updated 2 months ago
jkli1998 / DRM
Code for paper 'Leveraging Predicate and Triplet Learning for Scene Graph Generation'. (CVPR 2024)
☆22Updated 2 weeks ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆42Updated 3 months ago
yellow-binary-tree / HawkEye
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆33Updated 4 months ago
franciszzj / VLPrompt
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
☆13Updated 3 months ago
showlab / MovieSeq
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
☆17Updated 3 weeks ago
Becomebright / GroundVQA
Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.
☆49Updated last week
lezhang7 / Enhance-FineGrained
[CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆35Updated last month
mrwu-mac / ControlMLLM
Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆44Updated 3 weeks ago
DCDmllm / Momentor
☆43Updated 2 months ago
mrwu-mac / R-Bench
Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models' (ICML2024)
☆18Updated 2 weeks ago
showlab / GEB-Plus
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
☆16Updated 2 years ago
Hon-Wong / Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆47Updated 2 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆39Updated last week
knightyxp / DGL
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. Also, visualization and qb norm search for best performance…
☆28Updated 5 months ago
icq-benchmark / icq-benchmark
☆11Updated 2 months ago
Yui010206 / CREMA
☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆24Updated 3 months ago
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆38Updated 3 months ago
pkunlp-icler / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆37Updated 11 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆70Updated 2 weeks ago
ubc-vision / IterativeSG
☆19Updated last year
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆39Updated 2 months ago
Share14 / ShareGemini
☆19Updated last month
K-Nick / MS-DETR
An official implementation for MS-DETR in ACL'23
☆16Updated last year
Yuqifan1117 / CaCao
This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…
☆41Updated 6 months ago
joeyz0z / MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆31Updated last month
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆36Updated last year
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆20Updated 4 months ago
lntzm / MESM
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
☆28Updated 5 months ago
gpt4vision / OvSGTR
Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retent…
☆11Updated this week