ChenyuHeidiZhang / VL-commonsenseView external linksLinks
☆15May 23, 2022Updated 3 years ago
Alternatives and similar repositories for VL-commonsense
Users that are interested in VL-commonsense are comparing it to the libraries listed below
Sorting:
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- Source code and data for Things not Written in Text: Exploring Spatial Commonsense from Visual Signals (ACL2022 main conference paper).☆20Oct 10, 2022Updated 3 years ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆21Jul 21, 2025Updated 6 months ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 2 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 6 months ago
- [EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning☆15May 13, 2025Updated 9 months ago
- ☆16May 23, 2023Updated 2 years ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆22Jul 5, 2024Updated last year
- [NeurIPS 2022] "A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models", Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li…☆21Jan 9, 2024Updated 2 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Jun 4, 2025Updated 8 months ago
- Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))☆56Feb 6, 2023Updated 3 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 10 months ago
- Sentiment polarity annotations dataset☆26Nov 28, 2017Updated 8 years ago
- A curated list of zero-shot captioning papers☆24Aug 26, 2023Updated 2 years ago
- ☆59Sep 23, 2022Updated 3 years ago
- The SVO-Probes Dataset for Verb Understanding☆31Jan 28, 2022Updated 4 years ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- 服务器 GPU 监控程序,当 GPU 属性满足预设条件时通过微信发送提示消息☆32Aug 10, 2021Updated 4 years ago
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆32Sep 30, 2024Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆31May 16, 2024Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- ☆85Dec 4, 2022Updated 3 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Jul 27, 2021Updated 4 years ago
- ☆11Aug 20, 2025Updated 5 months ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 2 months ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- ☆10Jul 13, 2024Updated last year
- Improving Continuous Sign Language Recognition with Adapted Image Models☆14Nov 10, 2025Updated 3 months ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- [CVPR2025] Official code for Lost in Translation Found in Context☆23Jan 14, 2026Updated last month
- ☆45Aug 14, 2023Updated 2 years ago
- ☆12Dec 20, 2024Updated last year
- Official Repository for CLRCMD (Appear in ACL2022)☆43Feb 21, 2023Updated 2 years ago
- ☆12Jun 1, 2024Updated last year
- Tis is code for Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model (ACM MM 2024))☆12Aug 27, 2024Updated last year
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 3 years ago