devaansh100 / CLIPTrans
Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", published at ICCV'23.
☆17Updated 3 months ago
Related projects: ⓘ
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated 10 months ago
- Official code of *Towards Event-oriented Long Video Understanding*☆10Updated last month
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆42Updated 9 months ago
- ☆49Updated last year
- Retrieval-augmented Image Captioning☆12Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆68Updated 7 months ago
- ☆24Updated 11 months ago
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆33Updated 2 years ago
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆10Updated 7 months ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆32Updated last year
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)☆24Updated last year
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Updated last year
- ☆19Updated 11 months ago
- ☆30Updated 11 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆24Updated 9 months ago
- ☆29Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆51Updated last year
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆34Updated 2 months ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated last year
- Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Ars…☆15Updated last month
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆11Updated last month
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆12Updated last month
- ☆28Updated 2 weeks ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆24Updated 3 months ago
- ☆12Updated last month
- ☆25Updated 4 months ago
- Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆26Updated 4 months ago
- PHASE annotations for societal bias in vision-and-language tasks.☆15Updated 3 months ago
- ☆22Updated 2 years ago