EvolvingLMMs-Lab / NEOLinks
NEO Series: Native Vision-Language Models from First Principles
☆180Updated this week
Alternatives and similar repositories for NEO
Users that are interested in NEO are comparing it to the libraries listed below
Sorting:
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆104Updated 2 months ago
- Official PyTorch implementation of TokenSet.☆125Updated 7 months ago
- ☆61Updated 3 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆34Updated 4 months ago
- An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆87Updated last week
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆160Updated last month
- Official PyTorch Implementation for Dual-Process Image Generation, ICCV 2025☆101Updated last month
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆157Updated 3 months ago
- ☆91Updated 4 months ago
- ☆166Updated this week
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆113Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆122Updated 2 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆51Updated this week
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆139Updated 7 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆46Updated 3 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆128Updated 3 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆70Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆158Updated 2 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆90Updated 7 months ago
- ☆559Updated last week
- ICML 2025 - Impossible Videos☆77Updated 3 months ago
- Quick Long Video Understanding☆65Updated 4 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆60Updated 3 months ago
- Test-time Scaling for VAR models☆25Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆87Updated 2 months ago
- ☆179Updated 4 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆177Updated 2 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆68Updated last month
- ☆130Updated last week
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆178Updated last month