[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
☆37Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for Tem-adapter
Users that are interested in Tem-adapter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆20Jul 6, 2023Updated 2 years ago
- [CVPR 2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events☆66Feb 9, 2026Updated 4 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆17Jun 20, 2023Updated 2 years ago
- [ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts☆13Jan 13, 2025Updated last year
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆30Dec 27, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Implementation for the journal paper "DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering" (Jianyu et al., IEEE Tran…☆18Jun 22, 2021Updated 4 years ago
- ☆104Oct 19, 2022Updated 3 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆23Feb 16, 2024Updated 2 years ago
- ☆12Dec 15, 2023Updated 2 years ago
- LMM for VQA, tcsvt version☆10Jul 19, 2024Updated last year
- A curated list of zero-shot captioning papers☆24Aug 26, 2023Updated 2 years ago
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- Code for Static and Dynamic Concepts for Self-supervised Video Representation Learning.☆11Jul 28, 2022Updated 3 years ago
- ☆37Sep 16, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during pro…☆12Aug 31, 2020Updated 5 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆117Sep 15, 2022Updated 3 years ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆89Jul 1, 2024Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- [NeurIPS 2023] The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" acce…☆28May 14, 2024Updated 2 years ago
- Codes for our ACM MM 2019 paper: "Exploiting Temporal Relationships in Video Moment Localization with Natural Language"☆16Oct 22, 2022Updated 3 years ago
- Official PyTorch code of GroundVQA (CVPR'24)☆63Sep 13, 2024Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- Counterfactual Reasoning VQA Dataset☆28Nov 23, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An official implementation for MS-DETR in ACL'23☆17Jun 3, 2023Updated 3 years ago
- The efficient tuning method for VLMs☆82Mar 10, 2024Updated 2 years ago
- ☆22Sep 20, 2022Updated 3 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 11 months ago
- natual language guided image captioning☆89Feb 11, 2024Updated 2 years ago
- FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models☆33Nov 27, 2025Updated 6 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆46Jul 17, 2024Updated last year
- Long Context Transfer from Language to Vision☆407Mar 18, 2025Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Feb 28, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆24Aug 15, 2022Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆159Dec 9, 2024Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆126Jul 1, 2023Updated 2 years ago
- Evaluate robustness of adaptation methods on large vision-language models☆19Aug 23, 2023Updated 2 years ago
- ☆19Jun 14, 2025Updated last year
- [ICCV2023 Oral] Implicit Temporal Modeling with Learnable Alignment for Video Recognition☆41Nov 29, 2023Updated 2 years ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆49Nov 10, 2022Updated 3 years ago