aimagelab / ScanDiffLinks
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
☆22Updated last month
Alternatives and similar repositories for ScanDiff
Users that are interested in ScanDiff are comparing it to the libraries listed below
Sorting:
- Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024☆73Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 11 months ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆37Updated 9 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆77Updated 2 weeks ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆30Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆94Updated 8 months ago
- Public release of the code for "Accelerating Vision Transformers with Adaptive Patches"☆79Updated 2 months ago
- Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024☆45Updated last year
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆32Updated last year
- ☆15Updated 10 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆94Updated last month
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"☆87Updated 7 months ago
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆39Updated 10 months ago
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago
- Official code for CVPR2024 “VideoMAC: Video Masked Autoencoders Meet ConvNets”☆12Updated last year
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆63Updated 11 months ago
- [NeurIPS 2024] Understanding Multi-Granularity for Open-Vocabulary Part Segmentation☆59Updated last year
- ☆41Updated 6 months ago
- Official code for MotionBench (CVPR 2025)☆62Updated 10 months ago
- Personalized Representation from Personalized Generation (ICLR 2025)☆66Updated 10 months ago
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆76Updated 6 months ago
- Official repository for "SODA: Bottleneck Diffusion Models for Representation Learning"☆27Updated last year
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆129Updated 4 months ago
- ☆29Updated 5 months ago
- Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports☆39Updated last week
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆135Updated 9 months ago
- [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval☆33Updated 3 months ago
- This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, h…☆63Updated 8 months ago
- ☆26Updated 8 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆80Updated last month