EvolvingLMMs-Lab / NEOLinks
NEO Series: Native Vision-Language Models from First Principles
☆225Updated last month
Alternatives and similar repositories for NEO
Users that are interested in NEO are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆329Updated 5 months ago
- [NeurIPS 2025] Efficient Reasoning Vision Language Models☆425Updated 2 months ago
- (Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators☆632Updated 3 weeks ago
- Are Video Models Ready as Zero-shot Reasoners?☆80Updated last week
- ☆278Updated 4 months ago
- a family of versatile and state-of-the-art video tokenizers.☆426Updated 3 months ago
- Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"☆210Updated 4 months ago
- [NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆418Updated 2 months ago
- The code for PixelRefer & VideoRefer☆330Updated 3 weeks ago
- Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation☆359Updated last week
- GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities☆305Updated 7 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆94Updated last month
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆206Updated 4 months ago
- Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"☆152Updated 2 months ago
- [NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation☆178Updated last month
- ☆244Updated last year
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆37Updated 5 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆110Updated 3 months ago
- ☆573Updated 3 weeks ago
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation☆117Updated last year
- Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models☆158Updated last year
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆44Updated this week
- An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆147Updated last month
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆158Updated 7 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆177Updated last week
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆149Updated 8 months ago
- Echo-4o☆248Updated last month
- An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆102Updated 2 months ago
- [AAAI26] Next Patch Prediction☆131Updated 11 months ago
- Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views☆78Updated this week