KangLiao929 / PuffinLinks
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
☆359Updated last week
Alternatives and similar repositories for Puffin
Users that are interested in Puffin are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video☆249Updated last week
- [ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping☆81Updated this week
- Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"☆200Updated this week
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆391Updated 4 months ago
- Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, C…☆315Updated last year
- [ICLR'25] 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation☆361Updated 5 months ago
- [NeurIPS 2025 Spotlight] A Native Multimodal LLM for 3D Generation and Understanding☆510Updated last month
- 4DNeX: Feed-Forward 4D Generative Modeling Made Easy☆789Updated 2 months ago
- [ECCV2024] DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling☆223Updated 4 months ago
- RynnEC: Bringing MLLMs into Embodied World☆381Updated last month
- G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning☆194Updated last week
- This is the repository that contains source code for the PhysGen3D.☆232Updated 2 months ago
- [CVPR2025] AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction☆449Updated 8 months ago
- Pytorch Implementation of FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing (ICLR 2024)☆210Updated last year
- Are Video Models Ready as Zero-shot Reasoners?☆80Updated last week
- [NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation☆178Updated last month
- Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views☆78Updated this week
- ☆278Updated 4 months ago
- LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation☆37Updated 9 months ago
- [ICCV-2025] Official implementation of Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data☆93Updated 4 months ago
- Unified Multimodal Model for image generation/editing/understanding☆812Updated 2 months ago
- NEO Series: Native Vision-Language Models from First Principles☆225Updated last month
- ☆54Updated last month
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆124Updated last month
- [NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations☆429Updated last week
- Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning☆162Updated last month
- ✨ WithAnyone is capable of generating high-quality, controllable, and ID consistent images☆520Updated last month
- Implementation of paper: Flux Already Knows – Activating Subject-Driven Image Generation without Training☆138Updated 2 months ago
- Official implementation for paper: InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior☆544Updated last year
- [CVPR2024] Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion☆134Updated last year