FudanCVL / OmniAVSLinks
[ICCV 2025] Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
☆29Updated last month
Alternatives and similar repositories for OmniAVS
Users that are interested in OmniAVS are comparing it to the libraries listed below
Sorting:
- ☆44Updated 11 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆71Updated 10 months ago
- Transactions on Multimedia (TMM25)☆16Updated 5 months ago
- ICML2025☆57Updated 3 weeks ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆65Updated last week
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆49Updated 9 months ago
- [NeurIPS 2024] COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing☆24Updated 9 months ago
- This is the official implementation for ControlVAR.☆121Updated 9 months ago
- ☆28Updated last year
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction☆50Updated 3 weeks ago
- FQGAN: Factorized Visual Tokenization and Generation☆53Updated 5 months ago
- ☆127Updated 3 months ago
- ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models☆77Updated 2 weeks ago
- [ArXiv 2025] Follow-Your-Shape: This repo is the official implementation of "Follow-Your-Shape: Shape-Aware Image Editing via Trajectory…☆49Updated last month
- CODA: Repurposing Continuous VAEs for Discrete Tokenization☆28Updated 2 months ago
- [ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/To…☆139Updated 2 months ago
- Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"☆38Updated 6 months ago
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆58Updated last week
- [ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆27Updated last month
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆92Updated last month
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆56Updated 2 months ago
- ☆36Updated 3 months ago
- Official implementation for "Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter"☆47Updated last year
- Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].☆32Updated 10 months ago
- Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆55Updated 4 months ago
- ☆31Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆217Updated last month
- ☆45Updated 5 months ago
- [NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis☆64Updated last week
- [NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think☆102Updated last week