Xiuyuan-Chen / AutoEval-VideoLinks

☆36

Alternatives and similar repositories for AutoEval-Video

Users that are interested in AutoEval-Video are comparing it to the libraries listed below

Sorting:

mlfoundations / VisIT-Bench
☆50Updated last year
pkunlp-icler / PCA-EVAL
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆104Updated last year
AGI-Edgerunners / IIL
Code for our Paper "All in an Aggregated Image for In-Image Learning"
☆29Updated last year
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆37Updated 6 months ago
thunlp / Muffin
☆65Updated last year
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆49Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
patrick-tssn / Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…
☆123Updated 4 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆51Updated 10 months ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 8 months ago
OpenGVLab / Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
☆68Updated 2 years ago
FudanNLPLAB / MouSi
☆74Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆122Updated 6 months ago
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆45Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
imagegridworth / IG-VLM
☆138Updated last year
AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆29Updated 9 months ago
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆70Updated last year
DCDmllm / MorphTokens
☆43Updated last year
kkahatapitiya / LangRepo
Language Repository for Long Video Understanding
☆32Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
RifleZhang / LLaVA-Reasoner-DPO
☆91Updated 8 months ago
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆25Updated 8 months ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆57Updated 11 months ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆48Updated 6 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
gzcch / Bingo
☆55Updated last year