ispamm / GRAMLinks
Official PyTorch repository for GRAM
☆98Updated 5 months ago
Alternatives and similar repositories for GRAM
Users that are interested in GRAM are comparing it to the libraries listed below
Sorting:
- Code for the paper "Compositional Entailment Learning for Hyperbolic Vision-Language Models".☆85Updated 4 months ago
- [ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning☆72Updated last year
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆46Updated 11 months ago
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆73Updated 4 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆61Updated last year
- [ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.☆94Updated last week
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆57Updated last year
- A curated list of awesome self-supervised learning methods in videos☆155Updated 2 weeks ago
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆110Updated 2 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆59Updated 3 months ago
- The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]☆45Updated 7 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆130Updated 2 months ago
- Awesome papers & datasets specifically focused on long-term videos.☆321Updated 3 weeks ago
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆52Updated 8 months ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆91Updated 9 months ago
- Composed Video Retrieval☆61Updated last year
- ☆44Updated last year
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆54Updated 11 months ago
- The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".☆83Updated 6 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆46Updated 6 months ago
- Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024☆253Updated last month
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Updated last year
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆30Updated last month
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆91Updated 7 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆47Updated 2 weeks ago
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆73Updated 8 months ago
- Easy wrapper for inserting LoRA layers in CLIP.☆40Updated last year
- ☆81Updated last year
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆51Updated last year
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆39Updated 6 months ago