zijianchen98 / GAIALinks
[NeurIPS'24 Spotlight] GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
☆37Updated 10 months ago
Alternatives and similar repositories for GAIA
Users that are interested in GAIA are comparing it to the libraries listed below
Sorting:
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆47Updated last year
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆33Updated 10 months ago
- Unified layout planning and image generation, ICCV2025☆40Updated last week
- [ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs☆19Updated 2 months ago
- (CVPR 2023) Official implemention of the paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos…☆31Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Updated last year
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆42Updated 2 months ago
- ☆83Updated last year
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation☆22Updated 6 months ago
- [ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding☆66Updated 6 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"☆66Updated 2 years ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Updated 10 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated 2 years ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆65Updated last year
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing☆30Updated last month
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Updated last year
- ☆54Updated last year
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆77Updated 9 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆145Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Updated 11 months ago
- [ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model☆139Updated last year
- A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.☆19Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆66Updated 8 months ago
- ④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a bench…☆86Updated last year
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆95Updated last year
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆78Updated 2 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆105Updated last year