i2vec / A-survey-on-image-text-multimodal-modelsLinks

the repository of A survey on image-text multimodal models

☆45

Alternatives and similar repositories for A-survey-on-image-text-multimodal-models

Users that are interested in A-survey-on-image-text-multimodal-models are comparing it to the libraries listed below

Sorting:

JiaojiaoYe1994 / Awesome-DIffusionModels-paper
A curasted list of papers with the topic of Diffusion Models for Multi-Modal
☆31Updated last year
Haihsu / blogs
☆58Updated 8 months ago
DAMO-NLP-SG / VCD
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
☆349Updated last year
zhengli97 / PromptKD
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
☆341Updated 2 weeks ago
wangxiao5791509 / MultiModal_BigModels_Survey
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
☆289Updated 4 months ago
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆907Updated 2 months ago
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆172Updated 4 months ago
ustc-hyin / ClearSight
Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
☆41Updated 11 months ago
zhengli97 / Awesome-Prompt-Adapter-Learning-for-VLMs
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
☆716Updated this week
ssfgunner / IIS
[ICLR 2025 Spotlight] This is the official repository for our paper: ''Enhancing Pre-trained Representation Classifiability can Boost its…
☆23Updated 7 months ago
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆854Updated 4 months ago
lhanchao777 / LVLM-Hallucinations-Survey
This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…
☆88Updated last year
yfzhang114 / Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
☆791Updated this week
friedrichor / Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
☆296Updated last week
haokunwen / Awesome-Composed-Image-Retrieval
Collection of Composed Image Retrieval (CIR) papers.
☆280Updated last month
taishan1994 / llava-handbook
对llava官方代码的一些学习笔记
☆28Updated last year
statusrank / XCurve
XCurve is an end-to-end PyTorch library for X-Curve metrics optimizations in machine learning.
☆143Updated 2 years ago
zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆153Updated last year
Gary-code / Awesome-LVLM-paper
List of papers about Large Multimodal model
☆31Updated 6 months ago
yanghlll / ScalingNoise
☆40Updated 8 months ago
JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …
☆505Updated 8 months ago
beichenzbc / Long-CLIP
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
☆873Updated last year
shikiw / OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…
☆384Updated last year
muzairkhattak / multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
☆789Updated 2 years ago
inFaaa / Multimodal-Roadmap-for-freshman
本项目用于Multimodal领域新手的学习路线，包括该领域的经典论文，项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知，能够自己进行的独立研究。
☆41Updated last year
owenliang / mnist-clip
a super easy clip model with mnist dataset for study
☆150Updated last year
yuanzhoulvpi2017 / vscode_debug_transformers
☆400Updated 9 months ago
NishilBalar / Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
☆227Updated 2 months ago
jiazhen-code / PhD
[CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced eval…
☆27Updated 7 months ago
YiLunLee / missing_aware_prompts
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
☆225Updated last year