ImKeTT / ReSee
[EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation
☆12Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for ReSee
- [NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation☆11Updated last year
- This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness …☆19Updated last year
- [TNNLS, to appear] FET-LM: Flow Enhanced Variational Auto-Encoder for Topic-Guided Language Modeling PyTorch Implementation☆12Updated last year
- [Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics☆38Updated 2 years ago
- [KBS] PCAE: A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation PyTorch Implementation☆24Updated last year
- Code for Debiasing Vision-Language Models via Biased Prompts☆53Updated last year
- [Paperlist] Awesome paper list of controllable text generation via latent auto-encoders. Contributions of any kind are welcome.☆49Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆17Updated last year
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆33Updated 2 months ago
- How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?☆13Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆16Updated 5 months ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆30Updated last year
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆40Updated 2 years ago
- ☆25Updated 2 weeks ago
- ☆13Updated 3 months ago
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆29Updated last year
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆34Updated 3 years ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Updated last year
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆25Updated 6 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 5 months ago
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Updated 2 years ago
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆32Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆41Updated 4 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆25Updated 11 months ago