Qinying-Liu / Awesome-omni-modal-understandingLinks
Collection of papers about video-audio understanding
☆19Updated last week
Alternatives and similar repositories for Awesome-omni-modal-understanding
Users that are interested in Awesome-omni-modal-understanding are comparing it to the libraries listed below
Sorting:
- [TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".☆10Updated last year
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆161Updated last year
- Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral☆91Updated 2 years ago
- a unified and simple codebase for weakly-supervised temporal action localization☆19Updated 2 years ago
- Paper Reading of IMCC groups.☆18Updated 2 months ago
- ☆15Updated last year
- Accepted by ICCV2023, Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-bas…☆103Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆311Updated 8 months ago
- [TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”☆46Updated last year
- [NeurIPS 2024] A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era☆11Updated last year
- [ACL'25 Main] Official Implementation of HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Languag…☆44Updated 3 months ago
- ☆17Updated 8 months ago
- [TPAMI 2024] This is the official Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding"…☆27Updated 7 months ago
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆90Updated last year
- ☆154Updated 10 months ago
- Implementation of "Interleaved Latent Visual Reasoning with Selective Perceptual Modeling".☆34Updated 3 weeks ago
- ☆26Updated 8 months ago
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆31Updated 2 weeks ago
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆98Updated last year
- MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023☆80Updated 2 years ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆364Updated last year
- ☆37Updated 5 months ago
- Codes of the Fine-grained Textual Inversion network for Zero-Shot Composed Image Retrieval☆27Updated 8 months ago
- 🌟 手把手教你在论文中插入代码链接☆22Updated 4 months ago
- Code for paper "LLMs Can Evolve Continually on Modality for X-Modal Reasoning" NeurIPS2024☆41Updated last year
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆154Updated 9 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆42Updated last year
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆77Updated 2 years ago
- [CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception☆19Updated 6 months ago
- ☆11Updated 2 years ago