gimpong / AAAI25-S5VHLinks

The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).

☆19

Alternatives and similar repositories for AAAI25-S5VH

Users that are interested in AAAI25-S5VH are comparing it to the libraries listed below

Sorting:

OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆50Updated 5 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆59Updated last year
locuslab / llava-token-compression
☆45Updated last year
hammoudhasan / DiffCLIP
Official Implementation of DiffCLIP: Differential Attention Meets CLIP
☆47Updated 8 months ago
techmonsterwang / iLLaMA
Adapting LLaMA Decoder to Vision Transformer
☆30Updated last year
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆49Updated last year
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆46Updated 11 months ago
rui-qian / READ
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆48Updated last month
THU-MIG / VTC-CLS
official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"
☆23Updated 7 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 5 months ago
PKU-ML / adainf
Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"
☆31Updated last year
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 8 months ago
cocoshe / I2EBench
[NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
☆26Updated 4 months ago
GATECH-EIC / Castling-ViT
[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
☆30Updated last year
YBZh / DMN
CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
☆86Updated last year
moatifbutt / awesome-diffusion-iclr-2025
List of diffusion related active submissions on OpenReview for ICLR 2025.
☆45Updated last year
chenshuang-zhang / imagenet_d
[CVPR 2024 Highlight] ImageNet-D
☆45Updated last year
LeapLabTHU / InLine
Official repository of InLine attention (NeurIPS 2024)
☆56Updated 11 months ago
NUS-HPC-AI-Lab / DD-Ranking
Data distillation benchmark
☆71Updated 5 months ago
jnypark / VideoMamba
☆27Updated last year
OliverRensu / ARM
[ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
☆87Updated 5 months ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆94Updated 4 months ago
Saehyung-Lee / PlugIR
Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)
☆32Updated 8 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆50Updated last year
yu-rp / Dimple
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆112Updated 4 months ago
qhfan / RALA
[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention
☆36Updated 8 months ago
yibingwei-1 / LatentMIM
[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning…
☆29Updated 8 months ago
LaVi-Lab / Visual-Table
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Updated last year
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆74Updated last week
mc-lan / ClearCLIP
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
☆95Updated 8 months ago