DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆40Updated last year
Related projects ⓘ
Alternatives and complementary repositories for caption-by-committee
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year
- ☆50Updated 2 years ago
- A PyTorch implementation of EmpiricalMVM☆39Updated 10 months ago
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆55Updated last week
- Compress conventional Vision-Language Pre-training data☆49Updated last year
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆29Updated last month
- ☆60Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆22Updated 5 months ago
- ☆101Updated last year
- ☆25Updated last year
- ☆55Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- ☆65Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆31Updated last year