ytaek-oh / fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆13Updated 3 months ago
Alternatives and similar repositories for fsc-clip:
Users that are interested in fsc-clip are comparing it to the libraries listed below
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆32Updated last month
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆32Updated 3 months ago
- ☆29Updated last month
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆30Updated this week
- ☆10Updated 2 months ago
- ☆16Updated last year
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆12Updated last month
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆20Updated 5 months ago
- ☆37Updated 2 months ago
- ☆11Updated 6 months ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11Updated last year
- ☆22Updated 7 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆18Updated last week
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated last month
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆20Updated 3 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 months ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆44Updated 5 months ago
- ☆20Updated last year
- ☆26Updated 5 months ago
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆28Updated 2 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆10Updated last week
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆34Updated 3 weeks ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆25Updated last month
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆39Updated this week
- Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning☆34Updated last week
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆19Updated 4 months ago
- ☆15Updated 5 months ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆38Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆13Updated 10 months ago