dhg-wei / MCL
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
☆15Updated 2 weeks ago
Related projects: ⓘ
- TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆13Updated 3 months ago
- Turning to Video for Transcript Sorting☆44Updated last year
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- Compress conventional Vision-Language Pre-training data☆49Updated 11 months ago
- ☆13Updated this week
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆14Updated 2 years ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated last year
- ☆21Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆28Updated last year
- [AAAI 2022 Oral] This is a Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tail…☆33Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- Code for Static and Dynamic Concepts for Self-supervised Video Representation Learning.☆10Updated 2 years ago
- Temporal Alignment Representations with Contrastive Learning☆22Updated last year
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query (ICCV2021)☆20Updated 2 years ago
- VisualGPTScore for visio-linguistic reasoning☆26Updated 11 months ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆42Updated 2 years ago
- ☆55Updated 11 months ago
- [ECCV 2022] Official pytorch implementation of "mc-BEiT: Multi-choice Discretization for Image BERT Pre-training" in European Conference …☆22Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆22Updated 3 months ago
- ☆55Updated last year
- ☆43Updated 2 months ago
- The official code of paper "Automated Multi-level Preference for MLLMs"☆15Updated 3 weeks ago
- [ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"☆37Updated 2 months ago
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆23Updated 2 weeks ago
- ☆56Updated 2 years ago
- ☆19Updated last month
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated last year
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. Also, visualization and qb norm search for best performance…☆28Updated 5 months ago