[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
☆37Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for Tem-adapter
Users that are interested in Tem-adapter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆20Jul 6, 2023Updated 2 years ago
- [CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events☆66Feb 9, 2026Updated last month
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Jun 20, 2023Updated 2 years ago
- [ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts☆14Jan 13, 2025Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- Implementation for the journal paper "DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering" (Jianyu et al., IEEE Tran…☆18Jun 22, 2021Updated 4 years ago
- ☆101Oct 19, 2022Updated 3 years ago
- ☆12Dec 15, 2023Updated 2 years ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Jun 9, 2025Updated 9 months ago
- A curated list of zero-shot captioning papers☆24Aug 26, 2023Updated 2 years ago
- Code for Static and Dynamic Concepts for Self-supervised Video Representation Learning.☆11Jul 28, 2022Updated 3 years ago
- ☆37Sep 16, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during pro…☆12Aug 31, 2020Updated 5 years ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆119Oct 9, 2025Updated 5 months ago
- [NeurIPS 2023] The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" acce…☆27May 14, 2024Updated last year
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆117Sep 15, 2022Updated 3 years ago
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆198Jan 14, 2024Updated 2 years ago
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-training☆281Mar 25, 2023Updated 3 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆84Jul 1, 2024Updated last year
- Codes for our ACM MM 2019 paper: "Exploiting Temporal Relationships in Video Moment Localization with Natural Language"☆16Oct 22, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- [NeurIPS 2025] The official repo of "DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding".☆29Feb 7, 2026Updated last month
- Counterfactual Reasoning VQA Dataset☆28Nov 23, 2023Updated 2 years ago
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- An official implementation for MS-DETR in ACL'23☆17Jun 3, 2023Updated 2 years ago
- The efficient tuning method for VLMs☆81Mar 10, 2024Updated 2 years ago
- ☆22Sep 20, 2022Updated 3 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 8 months ago
- natual language guided image captioning☆87Feb 11, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models☆33Nov 27, 2025Updated 3 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆46Jul 17, 2024Updated last year
- Long Context Transfer from Language to Vision☆403Mar 18, 2025Updated last year
- An experiment with movie scenes and contrastive learning☆11Feb 1, 2025Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Feb 28, 2023Updated 3 years ago
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆24Aug 15, 2022Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year