iOPENCap / awesome-unimodal-trainingLinks
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
β11Updated 9 months ago
Alternatives and similar repositories for awesome-unimodal-training
Users that are interested in awesome-unimodal-training are comparing it to the libraries listed below
Sorting:
- π₯ Omni large models and datasets for understanding and generating multi-modalities.β15Updated 8 months ago
- Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.β27Updated 11 months ago
- A curated list of Awesome Personalized Large Multimodal Models resourcesβ31Updated last month
- This is a curated list of "Continual Learning with Pretrained Models" research.β18Updated last month
- π― Collect and create an outstanding homepage template that is relevant to your project. π Create a homepage for your work!β13Updated last month
- TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Modelsβ16Updated 6 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modelingβ104Updated 5 months ago
- β11Updated 5 months ago
- β45Updated last month
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentationβ30Updated 3 months ago
- Official Repo for FoodieQA paper (EMNLP 2024)β16Updated 3 weeks ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"β26Updated 6 months ago
- Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)β16Updated 8 months ago
- π A curated list of visual reasoning papers.β28Updated 2 weeks ago
- β27Updated 8 months ago
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehensβ¦β73Updated 2 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Modelsβ15Updated last week
- A comprehensive collection of open world papers from top tier conferences and journalsβ21Updated 6 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning withβ¦β31Updated 3 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigationβ89Updated 7 months ago
- A hot-pluggable tool for visualizing LLaVA's attention.β20Updated last year
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decayβ40Updated 3 weeks ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"β49Updated 4 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Updated 5 months ago
- Code for paper: Unified Text-to-Image Generation and Retrievalβ15Updated last year
- [ICCV 2023] Accurate and Fast Compressed Video Captioningβ47Updated 3 weeks ago
- An in-context learning research testbedβ19Updated 4 months ago
- List of resources for video retrieval.β18Updated 3 years ago
- β11Updated 9 months ago
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ46Updated this week