neu-vi / FleVRSLinks
FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024
☆21Updated 6 months ago
Alternatives and similar repositories for FleVRS
Users that are interested in FleVRS are comparing it to the libraries listed below
Sorting:
- Self-reimplemented version of 4D-LRM.☆30Updated 3 weeks ago
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated last year
- Code for our paper: Learning Camera Movement Control from Real-World Drone Videos☆29Updated 2 months ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆21Updated 2 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆70Updated last week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆75Updated 3 months ago
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆15Updated last month
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆41Updated last month
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 9 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆70Updated 4 months ago
- [arXiv 2024] The official repository of the paper "Unsupervised Discovery of Object-Centric Neural Fields"☆17Updated 4 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆34Updated 3 weeks ago
- ☆39Updated last year
- [CVPR 2025] GPS as a Control Signal for Image Generation☆19Updated 3 months ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆40Updated 3 weeks ago
- A list of works on video generation towards world model☆151Updated this week
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆37Updated 4 months ago
- ☆62Updated last year
- ☆26Updated last year
- VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆20Updated 3 months ago
- Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆48Updated this week
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆48Updated 3 weeks ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆80Updated 2 weeks ago
- Unifying Specialized Visual Encoders for Video Language Models☆21Updated last week
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆30Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 8 months ago
- ☆47Updated last month
- Sora Generates Videos with Stunning Geometrical Consistency☆50Updated last year
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- The official repository of "Sekai: A Video Dataset towards World Exploration"☆68Updated this week