ispamm / GRAMLinks

Official PyTorch repository for GRAM

☆85

Alternatives and similar repositories for GRAM

Users that are interested in GRAM are comparing it to the libraries listed below

Sorting:

PalAvik / hycoclip
Code for the paper "Compositional Entailment Learning for Hyperbolic Vision-Language Models".
☆77Updated last month
Visual-AI / FROSTER
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
☆85Updated 6 months ago
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆91Updated last month
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆54Updated 11 months ago
Malitha123 / awesome-video-self-supervised-learning
A curated list of awesome self-supervised learning methods in videos
☆149Updated 3 weeks ago
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆43Updated 9 months ago
xjjxmu / TextRefiner
The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]
☆42Updated 4 months ago
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆60Updated last year
muzairkhattak / ViFi-CLIP
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
☆285Updated last year
mzhaoshuai / RLCF
[ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.
☆85Updated last year
JiazuoYu / MoE-Adapters4CL
Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024
☆231Updated 8 months ago
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆64Updated last week
yannqi / COMBO-AVS
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…
☆40Updated 3 months ago
kyegomez / Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
☆184Updated 2 weeks ago
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆65Updated last year
OpenGVLab / TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
☆40Updated 4 months ago
sudo-Boris / mr-Blip
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆89Updated 5 months ago
vvvb-github / AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
☆67Updated 5 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆110Updated 3 weeks ago
GeWu-Lab / Ref-AVS
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
☆45Updated 8 months ago
Pter61 / osrcir
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]
☆54Updated last month
Timsty1 / FineCLIP
FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)
☆24Updated 7 months ago
mbzuai-oryx / CVRR-Evaluation-Suite
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆49Updated 11 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆76Updated 3 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆135Updated 3 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆69Updated 3 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆82Updated last year
HopLee6 / Sports-QA
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
☆33Updated 3 weeks ago
Westlake-AI / SemiReward
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
☆71Updated last year
muzairkhattak / PromptSRC
[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without F…
☆271Updated last year