CaptionQA: Is Your Caption as Useful as the Image Itself?
☆36Mar 3, 2026Updated last month
Alternatives and similar repositories for CaptionQA
Users that are interested in CaptionQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications☆20Nov 4, 2024Updated last year
- ☆14Apr 1, 2023Updated 3 years ago
- Code for our paper "HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition".☆14Jan 3, 2023Updated 3 years ago
- Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition☆14Dec 22, 2022Updated 3 years ago
- [ICCV'23] PAINet: Parallel Attention Interaction Network for Few-shot Skeleton-based Action Recognition☆11Oct 14, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆10Sep 30, 2024Updated last year
- Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach☆20Nov 17, 2025Updated 4 months ago
- ☆13Apr 30, 2025Updated 11 months ago
- [ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks☆27Oct 28, 2025Updated 5 months ago
- The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation☆25Aug 17, 2025Updated 7 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆134Apr 9, 2025Updated last year
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers☆34Dec 30, 2024Updated last year
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆79Mar 31, 2026Updated 2 weeks ago
- ☆13Jul 22, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Replication in Visual Diffusion Models: A Survey and Outlook☆31Apr 5, 2026Updated last week
- Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition☆24Feb 3, 2023Updated 3 years ago
- CatMAE☆14Dec 13, 2023Updated 2 years ago
- This is a collection of publications about videos.☆18Apr 29, 2021Updated 4 years ago
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆37Nov 19, 2025Updated 4 months ago
- Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation☆26Oct 20, 2022Updated 3 years ago
- ☆16Jul 6, 2023Updated 2 years ago
- A simple and effective feature extractor for untrimmed videos☆13Sep 1, 2022Updated 3 years ago
- The official implementation of paper "Can Textual Gradient Work in Federated Learning?" accepted at ICLR 2025☆16Mar 10, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Dec 12, 2025Updated 4 months ago
- [NeurIPS2024 Oral] PyTorch implementation of DenoiseRep☆35Sep 23, 2025Updated 6 months ago
- ☆35Feb 15, 2026Updated 2 months ago
- Generalization in Metric Learning: Should the Embedding Layer be the Embedding Layer?☆11Jan 3, 2019Updated 7 years ago
- Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)☆11Oct 25, 2021Updated 4 years ago
- Code release for "Generative Modeling of Weights: Generalization or Memorization?"☆19Apr 9, 2026Updated last week
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆50Apr 9, 2026Updated last week
- A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)☆15Oct 18, 2021Updated 4 years ago
- ☆11Aug 10, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆11Sep 7, 2020Updated 5 years ago
- ☆53Jun 4, 2025Updated 10 months ago
- ☆13Sep 23, 2023Updated 2 years ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated last year
- ☆43Dec 16, 2025Updated 4 months ago
- ☆15Dec 10, 2024Updated last year
- [CVPR2026 Highlight] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens https://arxiv.org/abs…☆53Updated this week