iOPENCap / awesome-unimodal-trainingLinks
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆12Updated last year
Alternatives and similar repositories for awesome-unimodal-training
Users that are interested in awesome-unimodal-training are comparing it to the libraries listed below
Sorting:
- ☆11Updated last year
- ☆11Updated last year
- Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)☆19Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆58Updated last year
- A hot-pluggable tool for visualizing LLaVA's attention.☆24Updated 2 years ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Updated last year
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆27Updated 5 months ago
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆77Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Updated 2 months ago
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆69Updated 4 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Updated 7 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆132Updated 4 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆31Updated last year
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆39Updated 4 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆64Updated last year
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆22Updated last year
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆142Updated 5 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆104Updated 5 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆185Updated 4 months ago
- ☆23Updated 9 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated last year
- Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text …☆14Updated 2 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆34Updated last year
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆47Updated this week
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆18Updated 10 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆53Updated 6 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆163Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆55Updated last year
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Updated last year
- Official Repo for FoodieQA paper (EMNLP 2024)☆19Updated 7 months ago