sanjayss34 / codevqaLinks

☆84

Alternatives and similar repositories for codevqa

Users that are interested in codevqa are comparing it to the libraries listed below

Sorting:

YujieLu10 / TIP
Multimodal-Procedural-Planning
☆92Updated 2 years ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆131Updated 2 years ago
lupantech / IconQA
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
☆52Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆103Updated last year
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆70Updated last year
dhansmair / flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆167Updated 2 years ago
prometheus-eval / prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…
☆78Updated last year
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
huggingface / m4-logs
M4 experiment logbook
☆57Updated 2 years ago
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
allenai / unified-io-inference
☆227Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
shizhediao / DaVinci
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆42Updated 2 years ago
gregor-ge / mBLIP
☆87Updated last year
LAION-AI / General-GPT
☆65Updated 2 years ago
kkahatapitiya / LangRepo
Language Repository for Long Video Understanding
☆32Updated last year
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆147Updated last month
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 3 months ago
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
umd-huang-lab / Mementos
☆31Updated last year
huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆206Updated last year
OpenGVLab / Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
☆68Updated 2 years ago
aszala / EnvGen
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆38Updated last year
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated last year
reka-ai / reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
☆181Updated 10 months ago
belindal / LaMPP
Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action
☆37Updated 2 years ago
facebookresearch / VLaMP
Code for “Pretrained Language Models as Visual Planners for Human Assistance”
☆61Updated 2 years ago
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago