Yuting-Gao / PyramidCLIP
Implementation of PyramidCLIP(NeurIPS2022).
☆30Updated 2 years ago
Alternatives and similar repositories for PyramidCLIP:
Users that are interested in PyramidCLIP are comparing it to the libraries listed below
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆66Updated 3 months ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆96Updated last year
- SeqTR: A Simple yet Universal Network for Visual Grounding☆131Updated 2 months ago
- ☆87Updated last year
- Turning to Video for Transcript Sorting☆48Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆84Updated this week
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated 11 months ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆84Updated 2 years ago
- [AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding☆57Updated 2 years ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆50Updated this week
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆42Updated this week
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆68Updated 7 months ago
- A lightweight codebase for referring expression comprehension and segmentation☆52Updated 2 years ago
- ☆58Updated last year
- ☆22Updated last year
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆26Updated last month
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆35Updated 3 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆52Updated last year
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆25Updated this week
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆61Updated 2 months ago
- Official implementation of TagAlign☆34Updated last month
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆45Updated this week
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆48Updated 11 months ago
- ☆88Updated last year
- ☆114Updated 7 months ago
- [ECCV2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval☆75Updated 2 years ago
- ☆110Updated 11 months ago