OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆75Updated 8 months ago
Related projects: ⓘ
- ☆128Updated 8 months ago
- ☆53Updated 7 months ago
- ☆46Updated 10 months ago
- ☆73Updated 8 months ago
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- LVBench: An Extreme Long Video Understanding Benchmark☆51Updated 2 weeks ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆49Updated last month
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆121Updated 10 months ago
- ☆110Updated 4 months ago
- ☆83Updated 9 months ago
- Official repository of MMDU dataset☆61Updated last month
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆71Updated 7 months ago
- InstructionGPT-4☆35Updated 8 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆158Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆218Updated last week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆92Updated 2 months ago
- Official repo for StableLLAVA☆90Updated 8 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆84Updated 2 months ago
- Official github repo of G-LLaVA☆116Updated 3 months ago
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆63Updated 9 months ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆189Updated 6 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆75Updated 2 weeks ago
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- SVIT: Scaling up Visual Instruction Tuning☆159Updated 3 months ago