OpenGVLab / vinci
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
☆56Updated 3 months ago
Alternatives and similar repositories for vinci:
Users that are interested in vinci are comparing it to the libraries listed below
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆37Updated last month
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆54Updated 2 weeks ago
- VideoAuteur: Towards Long Narrative Video Generation☆35Updated 3 months ago
- ☆40Updated 3 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆45Updated last month
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆102Updated 5 months ago
- Official code for MotionBench (CVPR 2025)☆34Updated last month
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆11Updated last week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆69Updated last month
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 4 months ago
- Repo for "Human-Centric Foundation Models: Perception, Generation and Agentic Modeling" (https://arxiv.org/abs/2502.08556)☆39Updated 2 months ago
- Code for our paper: Learning Camera Movement Control from Real-World Drone Videos☆27Updated this week
- [CVPR2024] Official implementation of the paper: Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning☆39Updated 10 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆31Updated this week
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆91Updated last week
- [arXiv'24] Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space☆41Updated 5 months ago
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Updated 4 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆26Updated last year
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆66Updated last month
- ☆72Updated 2 weeks ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆107Updated last month
- VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆17Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆48Updated 3 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆71Updated 6 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆90Updated this week
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆113Updated 3 months ago
- Official repo of "Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs"☆52Updated last month
- [NeurIPS 2024] Official code for paper "EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection"☆30Updated last month
- ☆9Updated 10 months ago