CaptionQA: Is Your Caption as Useful as the Image Itself?
☆34Mar 3, 2026Updated 3 months ago
Alternatives and similar repositories for CaptionQA
Users that are interested in CaptionQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications☆21Nov 4, 2024Updated last year
- Towards Memorization-Free Diffusion Models (CVPR2024) Codebase☆11Jun 2, 2024Updated 2 years ago
- ☆14Apr 1, 2023Updated 3 years ago
- Code for our paper "HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition".☆15Jan 3, 2023Updated 3 years ago
- [ICCV'23] PAINet: Parallel Attention Interaction Network for Few-shot Skeleton-based Action Recognition☆11Oct 14, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆11Mar 16, 2024Updated 2 years ago
- ☆12Sep 30, 2024Updated last year
- [CVPR 2026 Findings] Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach☆23Jun 11, 2026Updated last week
- ☆13Apr 30, 2025Updated last year
- Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch☆18Jan 25, 2018Updated 8 years ago
- [ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks☆31Oct 28, 2025Updated 7 months ago
- The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation☆26Aug 17, 2025Updated 10 months ago
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers☆34Dec 30, 2024Updated last year
- [CVPR2026] BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers☆35Mar 17, 2026Updated 3 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆89May 26, 2026Updated 3 weeks ago
- ☆13Jul 22, 2024Updated last year
- Replication in Visual Diffusion Models: A Survey and Outlook☆31Apr 5, 2026Updated 2 months ago
- This is a collection of publications about videos.☆18Apr 29, 2021Updated 5 years ago
- Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation☆26Oct 20, 2022Updated 3 years ago
- 🐧 Unify-Agent: An end-to-end unified multimodal agent for faithful, knowledge-grounded image generation.☆82May 2, 2026Updated last month
- A simple and effective feature extractor for untrimmed videos☆13Sep 1, 2022Updated 3 years ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆15Apr 7, 2026Updated 2 months ago
- The official implementation of paper "Can Textual Gradient Work in Federated Learning?" accepted at ICLR 2025☆16Mar 10, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆26Dec 12, 2025Updated 6 months ago
- ☆40May 9, 2026Updated last month
- 前端视频计算,对比原生 JS、WebAssembly、JS Worker、CSS Filter ( GPU多线程加速 )☆22Dec 26, 2023Updated 2 years ago
- This is a comprehensive resource repository for deep learning model inversion attacks and defenses research.☆30Nov 13, 2025Updated 7 months ago
- Generalization in Metric Learning: Should the Embedding Layer be the Embedding Layer?☆11Jan 3, 2019Updated 7 years ago
- Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)☆11Oct 25, 2021Updated 4 years ago
- Code release for "Generative Modeling of Weights: Generalization or Memorization?"☆21Apr 9, 2026Updated 2 months ago
- A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)☆15Oct 18, 2021Updated 4 years ago
- ☆11Aug 10, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆56Jun 4, 2025Updated last year
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆42Nov 19, 2025Updated 6 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Sep 17, 2021Updated 4 years ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated last year
- ☆15Dec 10, 2024Updated last year
- A funny cocos2dx game ! Learn From the idea of a game called Fleabag Vs. Mutt.☆10Aug 13, 2017Updated 8 years ago