sterzhang/image-textualization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sterzhang/image-textualization)

sterzhang / image-textualization

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)

☆172

Alternatives and similar repositories for image-textualization

Users that are interested in image-textualization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sterzhang / PVIT
View on GitHub
Official Repository of Personalized Visual Instruct Tuning
☆34Mar 6, 2025Updated last year
pipilurj / ROBOT
View on GitHub
☆27Apr 11, 2023Updated 3 years ago
SKURA502 / sae-analysis
View on GitHub
A toolkit for systematically understanding the concepts encoded in Sparse Autoencoders.
☆20Apr 5, 2026Updated 3 months ago
xyq7 / Human-Contribution-Measurement
View on GitHub
☆13Jun 4, 2025Updated last year
pipilurj / MLLM-protector
View on GitHub
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆46Apr 21, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pipilurj / bootstrapped-preference-optimization-BPO
View on GitHub
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆63Aug 23, 2024Updated last year
W-Ted / UDC-NeRF
View on GitHub
Official code for ICCV2023 paper: Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis
☆34Dec 27, 2023Updated 2 years ago
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
foundation-multimodal-models / CAPTURE
View on GitHub
☆86Jul 27, 2024Updated last year
hanghuacs / FineCaption
View on GitHub
☆39Jun 20, 2025Updated last year
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
M3-IT / YING-VLM
View on GitHub
Vision Large Language Models trained on M3IT instruction tuning dataset
☆17Aug 16, 2023Updated 2 years ago
pipilurj / G-LLaVA
View on GitHub
Official github repo of G-LLaVA
☆154Feb 20, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
FlagOpen / RoboBrain_Dex
View on GitHub
☆42May 3, 2026Updated 2 months ago
ugorsahin / Generative-Negative-Mining
View on GitHub
[WACV 2024] Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024
☆13Jan 3, 2024Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated last year
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated 2 years ago
vlm2-bench / VLM2-Bench
View on GitHub
VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
☆45May 20, 2025Updated last year
Jiaxuan-Li / EVCap
View on GitHub
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆64Apr 8, 2024Updated 2 years ago
XuankunRong / SafeGRPO
View on GitHub
[CVPR'26] SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
☆21Feb 19, 2026Updated 5 months ago
Visualignment / SafetyDPO
View on GitHub
☆34Aug 26, 2025Updated 10 months ago
RobertBiehl / multimodal-instruct
View on GitHub
Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.
☆13Mar 13, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
foundation-multimodal-models / CAL
View on GitHub
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Sep 26, 2024Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
zeyofu / Commonsense-T2I
View on GitHub
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆24Aug 13, 2024Updated last year
MonolithFoundation / Bumblebee
View on GitHub
A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.
☆38Sep 9, 2024Updated last year
dhg-wei / TOPA
View on GitHub
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆29Sep 27, 2024Updated last year
yigu1008 / Diffusion-RPO
View on GitHub
☆15Mar 30, 2025Updated last year
OptimalScale / DetGPT
View on GitHub
☆786Aug 7, 2024Updated last year
XuankunRong / BYE
View on GitHub
[NeurIPS'25] Backdoor Cleaning without External Guidance in MLLM Fine-tuning
☆20Oct 13, 2025Updated 9 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
JIA-Lab-research / Mr-Ben
View on GitHub
This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
☆51Oct 31, 2024Updated last year
SivanDoveh / TSVLC
View on GitHub
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆47Sep 25, 2023Updated 2 years ago
prismformore / SDSEN
View on GitHub
☆20May 26, 2020Updated 6 years ago
xk-huang / segment-caption-anything
View on GitHub
[CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆232Sep 30, 2024Updated last year
prismformore / expAT
View on GitHub
TIP: Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification
☆21Mar 29, 2021Updated 5 years ago
rt219 / LatentGuard
View on GitHub
This is the official repo of the paper "Latent Guard: a Safety Framework for Text-to-image Generation"
☆54Oct 24, 2024Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago