KangLiao929 / PuffinLinks
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
☆376Updated last week
Alternatives and similar repositories for Puffin
Users that are interested in Puffin are comparing it to the libraries listed below
Sorting:
- Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation☆249Updated last month
- [NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video☆259Updated last month
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆406Updated 5 months ago
- G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning☆245Updated this week
- [ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping☆86Updated last month
- 4DNeX: Feed-Forward 4D Generative Modeling Made Easy☆814Updated last month
- [NeurIPS 2025 Spotlight] A Native Multimodal LLM for 3D Generation and Understanding☆534Updated 2 months ago
- Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"☆243Updated last week
- [ICLR'25] 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation☆365Updated 6 months ago
- Echo-4o☆474Updated last month
- Are Video Models Ready as Zero-shot Reasoners?☆84Updated last month
- RynnEC: Bringing MLLMs into Embodied World☆383Updated 2 months ago
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆134Updated 3 months ago
- [ECCV2024] DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling☆228Updated last month
- This is the repository that contains source code for the PhysGen3D.☆239Updated 4 months ago
- AnyTalker: Scaling Multi-person Talking Video Generation with Interactivity Refinement☆252Updated last month
- Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"☆124Updated 3 months ago
- Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model☆925Updated 3 weeks ago
- Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, C…☆318Updated last year
- ☆280Updated 5 months ago
- Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning☆174Updated 2 months ago
- 🌐 3D and 4D World Modeling: A Survey☆760Updated last month
- LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation☆38Updated 10 months ago
- Open-source SOTA multi-image editing model☆827Updated this week
- [Tutorial] Few-Step Distillation for Text-to-Image Generation: A Practical Guide☆322Updated 2 weeks ago
- ✨ WithAnyone is capable of generating high-quality, controllable, and ID consistent images☆546Updated last month
- [NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation☆190Updated last week
- NEO Series: Native Vision-Language Models from First Principles☆630Updated last week
- A Unified Driving World Model for Future Generation and Perception☆133Updated 5 months ago
- Implementation of paper: Flux Already Knows – Activating Subject-Driven Image Generation without Training☆140Updated 4 months ago