Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
Alternatives and similar repositories for VidIL
Users that are interested in VidIL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37May 23, 2023Updated 2 years ago
- ☆11Nov 5, 2024Updated last year
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆136May 5, 2023Updated 2 years ago
- Data Release for VALUE Benchmark☆30Feb 16, 2022Updated 4 years ago
- ☆12Mar 12, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆188May 1, 2025Updated 10 months ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Oct 20, 2022Updated 3 years ago
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)☆374Jul 29, 2023Updated 2 years ago
- [NeurIPS 2022] Egocentric Video-Language Pretraining☆258May 9, 2024Updated last year
- [EMNLP'22] Weakly-Supervised Temporal Article Grounding☆14Nov 25, 2023Updated 2 years ago
- MERLOT: Multimodal Neural Script Knowledge Models☆226Mar 15, 2022Updated 4 years ago
- [ECCV-2022]Grounding Visual Representations with Texts for Domain Generalization☆30Apr 7, 2023Updated 2 years ago
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆198Jan 14, 2024Updated 2 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding☆64Mar 9, 2022Updated 4 years ago
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆51May 29, 2024Updated last year
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆211Dec 18, 2022Updated 3 years ago
- A one-stop shop for YouCook2 info such as leaderboard and recent advances on (cooking) video retrieval and captioning.☆41Jun 29, 2022Updated 3 years ago
- Starter Code for VALUE benchmark☆80Aug 23, 2022Updated 3 years ago
- ☆182Aug 20, 2022Updated 3 years ago
- A Unified Framework for Video-Language Understanding☆61Jun 17, 2023Updated 2 years ago
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆420Oct 28, 2022Updated 3 years ago
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆34Nov 9, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A PyTorch implementation of VIOLET☆140Dec 17, 2023Updated 2 years ago
- Code for Look for the Change paper published at CVPR 2022☆36Oct 26, 2022Updated 3 years ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆43Jul 15, 2022Updated 3 years ago
- [ACL 2021] mTVR: Multilingual Video Moment Retrieval☆27Aug 20, 2022Updated 3 years ago
- Official Code of ECCV 2022 paper MS-CLIP☆91Jul 27, 2022Updated 3 years ago
- ☆193Oct 22, 2022Updated 3 years ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆380May 19, 2022Updated 3 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Sep 11, 2023Updated 2 years ago
- Language Models Can See: Plugging Visual Controls in Text Generation☆258Jun 1, 2022Updated 3 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated last year
- Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"☆146Jun 1, 2022Updated 3 years ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Jun 20, 2023Updated 2 years ago
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Jul 14, 2021Updated 4 years ago
- ☆76Oct 22, 2022Updated 3 years ago