bronyayang/CaptionQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bronyayang/CaptionQA)

bronyayang / CaptionQA

[CVPR '26] CaptionQA: Is Your Caption as Useful as the Image Itself?

☆38

Alternatives and similar repositories for CaptionQA

Users that are interested in CaptionQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆49Jun 29, 2026Updated last month
InternLM / CapRL
View on GitHub
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
☆226Jun 23, 2026Updated last month
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
aimagelab / PMA-Net
View on GitHub
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Jun 7, 2024Updated 2 years ago
ydk122024 / PediatricsGPT
View on GitHub
[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications
☆21Nov 4, 2024Updated last year
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
View on GitHub
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆177Oct 6, 2025Updated 9 months ago
KPeng9510 / Trans4SOAR
View on GitHub
☆14Apr 1, 2023Updated 3 years ago
baopj / Vid-Morp
View on GitHub
☆12Dec 6, 2024Updated last year
baopj / E3M
View on GitHub
[ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.
☆11Jul 16, 2024Updated 2 years ago
ljang0 / videowebarena
View on GitHub
☆14Dec 25, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
chunsanHong / MemBench_code
View on GitHub
☆12Sep 30, 2024Updated last year
alibaba-mmai-research / HyRSMPlusPlus
View on GitHub
Code for our paper "HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition".
☆15Jan 3, 2023Updated 3 years ago
R00Kie-Liu / Sampler
View on GitHub
Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition
☆14Dec 22, 2022Updated 3 years ago
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆68Mar 3, 2026Updated 4 months ago
starrycos / PAINet
View on GitHub
[ICCV'23] PAINet: Parallel Attention Interaction Network for Few-shot Skeleton-based Action Recognition
☆11Oct 14, 2023Updated 2 years ago
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated 2 years ago
MSIIP / Connector-S
View on GitHub
☆13Apr 30, 2025Updated last year
xiaoqian-shen / Vgent
View on GitHub
[NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent
☆49Nov 30, 2025Updated 7 months ago
DeepIntoStreams / GCN-DevLSTM
View on GitHub
☆11Mar 16, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
facebookresearch / MetaEmbed
View on GitHub
[ICLR 2026 Oral] Official Implementation of the paper "MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactio…
☆18Jul 2, 2026Updated 3 weeks ago
aimagelab / awesome-captioning-evaluation
View on GitHub
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
☆36Nov 25, 2025Updated 8 months ago
HKU-MMLab / Math-VR-CodePlot-CoT
View on GitHub
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
☆63Nov 4, 2025Updated 8 months ago
zhipeng-wei / TT
View on GitHub
☆17Sep 23, 2022Updated 3 years ago
Lihr747 / CgtGAN
View on GitHub
☆20May 3, 2025Updated last year
thunlp / NOSA
View on GitHub
The official implementation of NOSA
☆19Jun 11, 2026Updated last month
ali-vilab / DreamRelation
View on GitHub
[ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"
☆27Feb 4, 2026Updated 5 months ago
facebookresearch / CausalVQA
View on GitHub
We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ under…
☆62Aug 18, 2025Updated 11 months ago
MiniMax-AI / VTP
View on GitHub
[ECCV 2026] Towards Scalable Pre-training of Visual Tokenizers for Generation
☆495Apr 15, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wusize / OpenUni
View on GitHub
☆189Jun 27, 2025Updated last year
sinwang20 / D2PO
View on GitHub
[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…
☆18Jul 22, 2025Updated last year
EIT-NLP / Speak-While-Watching
View on GitHub
☆17Mar 1, 2026Updated 4 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
TiankaiHang / CCA
View on GitHub
☆22Jan 26, 2024Updated 2 years ago
sdc17 / CrossGET
View on GitHub
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
☆34Dec 30, 2024Updated last year
usail-hkust / benchmark_inference_time_computation_LLM
View on GitHub
[NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning
☆16Sep 20, 2025Updated 10 months ago