gimpong / AAAI25-S5VHLinks
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
β18Updated 5 months ago
Alternatives and similar repositories for AAAI25-S5VH
Users that are interested in AAAI25-S5VH are comparing it to the libraries listed below
Sorting:
- π₯Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".β51Updated 11 months ago
- β16Updated 2 months ago
- [NeurIPS2024] Tune your restoration model with one 3090 GPU!β74Updated 4 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ40Updated 3 months ago
- Official Repository of Personalized Visual Instruct Tuningβ28Updated 3 months ago
- List of diffusion related active submissions on OpenReview for ICLR 2025.β28Updated 7 months ago
- [NeurIPS2024] Overcome hallucination of diffusion restoration models.β49Updated last month
- [CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantizationβ24Updated 2 months ago
- β19Updated 2 months ago
- β42Updated 6 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Modelsβ19Updated 4 months ago
- [ECCV 2024] Official Pytorch Implementation of A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessmentβ84Updated 10 months ago
- Data distillation benchmarkβ64Updated this week
- EMPO, A Fully Unsupervised RLVR Methodβ30Updated this week
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attentionβ21Updated 2 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ55Updated 2 weeks ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"β43Updated 11 months ago
- Adapting LLaMA Decoder to Vision Transformerβ28Updated last year
- CLIP-MoE: Mixture of Experts for CLIPβ37Updated 7 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Typesβ18Updated last month
- Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ22Updated this week
- Official released code for VQAΒ² series modelsβ44Updated last month
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Modelsβ20Updated 6 months ago
- β10Updated 9 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"β21Updated last month
- VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.β40Updated this week
- PyTorch code for our paper "Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment"β45Updated 2 weeks ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". β¦β53Updated 7 months ago
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ36Updated 3 months ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"β43Updated 3 months ago