yangbang18 / ZeroNLG
(TPAMI'2024) ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
☆21Updated 9 months ago
Alternatives and similar repositories for ZeroNLG
Users that are interested in ZeroNLG are comparing it to the libraries listed below
Sorting:
- A comprehensive overview of affective computing research in the era of large language models (LLMs).☆22Updated 9 months ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 9 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 months ago
- ☆24Updated last year
- Retrieval-augmented Image Captioning☆13Updated 2 years ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated 11 months ago
- Preference Learning for LLaVA☆44Updated 6 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆17Updated 9 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 9 months ago
- ☆19Updated last year
- PyTorch implementation of StableMask (ICML'24)☆12Updated 10 months ago
- ☆22Updated 9 months ago
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆14Updated last month
- Project for SNARE benchmark☆11Updated 11 months ago
- [AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning☆16Updated last year
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆64Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated 11 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆16Updated 7 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆30Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆54Updated last month
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆14Updated 5 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆49Updated last year
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆27Updated 2 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆49Updated 6 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated 7 months ago
- An automatic MLLM hallucination detection framework☆19Updated last year
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 7 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 5 months ago