ispamm / GRAMLinks
Official PyTorch repository for GRAM
☆93Updated 4 months ago
Alternatives and similar repositories for GRAM
Users that are interested in GRAM are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆107Updated 2 weeks ago
- Code for the paper "Compositional Entailment Learning for Hyperbolic Vision-Language Models".☆82Updated 3 months ago
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆43Updated 10 months ago
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".☆290Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆55Updated last year
- A curated list of awesome self-supervised learning methods in videos☆152Updated last month
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆69Updated last month
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆91Updated 6 months ago
- The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]☆44Updated 6 months ago
- The suite of modeling video with Mamba☆278Updated last year
- Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports☆33Updated 2 months ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆88Updated 8 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆60Updated last year
- Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)☆29Updated last month
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆31Updated 5 months ago
- Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024☆239Updated 2 weeks ago
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆49Updated last month
- Official Implementation of SnAG (CVPR 2024)☆51Updated 4 months ago
- Open source implementation of "Vision Transformers Need Registers"☆190Updated last week
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆192Updated 2 months ago
- ☆41Updated 10 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Updated last year
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆52Updated 6 months ago
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆70Updated 3 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆58Updated 2 months ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆287Updated last year
- [ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning☆71Updated last year
- Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding☆42Updated 2 weeks ago
- Awesome papers & datasets specifically focused on long-term videos.☆313Updated last month
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆136Updated 4 months ago