FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
☆55Apr 17, 2024Updated last year
Alternatives and similar repositories for FuseCap
Users that are interested in FuseCap are comparing it to the libraries listed below
Sorting:
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆43Dec 19, 2025Updated 2 months ago
- [ECCV 2022] Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation☆19Jul 18, 2022Updated 3 years ago
- ☆18Dec 21, 2018Updated 7 years ago
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- The official implementation for Candidate Set Re-ranking for Composed Image Retrieval (TMLR) 01/2024☆20Feb 7, 2024Updated 2 years ago
- ☆58Apr 24, 2024Updated last year
- [CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents☆32Jun 3, 2025Updated 9 months ago
- ILIAS: Instance-Level Image retrieval At Scale☆34Oct 6, 2025Updated 4 months ago
- ☆54Apr 13, 2023Updated 2 years ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- ☆31Dec 18, 2025Updated 2 months ago
- ☆24Feb 26, 2023Updated 3 years ago
- Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition☆28Aug 29, 2023Updated 2 years ago
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)☆340Jan 8, 2024Updated 2 years ago
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆30Dec 31, 2025Updated 2 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆40Nov 27, 2024Updated last year
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…☆34Feb 7, 2024Updated 2 years ago
- ☆38Feb 4, 2023Updated 3 years ago
- Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)☆83Sep 12, 2023Updated 2 years ago
- Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR☆80Dec 2, 2022Updated 3 years ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆222Oct 20, 2025Updated 4 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆213Feb 27, 2024Updated 2 years ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 4 months ago
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆33Apr 18, 2022Updated 3 years ago
- The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…☆34Jun 21, 2022Updated 3 years ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆251Jan 22, 2025Updated last year
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆18Feb 14, 2026Updated 2 weeks ago
- KL-loss☆10Oct 14, 2019Updated 6 years ago
- ☆12Feb 16, 2024Updated 2 years ago
- ☆17Apr 25, 2023Updated 2 years ago
- PaintGL, a realtime PBR 3D painting web-app.☆17Jun 23, 2021Updated 4 years ago
- Directed masked autoencoders☆14Feb 20, 2026Updated last week
- The implementation of Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning☆13Apr 14, 2024Updated last year
- ☆10Jul 6, 2023Updated 2 years ago
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆45Jun 13, 2023Updated 2 years ago
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"☆168Jul 23, 2025Updated 7 months ago