rednote-hilab / dots.vlm1View external linksLinks
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
☆285Sep 26, 2025Updated 4 months ago
Alternatives and similar repositories for dots.vlm1
Users that are interested in dots.vlm1 are comparing it to the libraries listed below
Sorting:
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,156Jul 15, 2025Updated 6 months ago
- Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer☆136Oct 14, 2025Updated 4 months ago
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,182Jan 27, 2026Updated 2 weeks ago
- ☆18Jun 10, 2023Updated 2 years ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆367Jul 24, 2025Updated 6 months ago
- [ICLR 2026] Geometric-Mean Policy Optimization☆100Jan 26, 2026Updated 2 weeks ago
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆155Mar 1, 2025Updated 11 months ago
- [ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching i…☆44Jul 10, 2025Updated 7 months ago
- ☆17Aug 5, 2025Updated 6 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,430Sep 22, 2025Updated 4 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Jun 9, 2024Updated last year
- ☆100Aug 8, 2025Updated 6 months ago
- ☆52Jul 16, 2025Updated 6 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,544Jun 14, 2025Updated 8 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated last month
- ☆813Jun 9, 2025Updated 8 months ago
- ☆14Jun 16, 2023Updated 2 years ago
- ☆15Nov 11, 2024Updated last year
- MegaRAG: Multimodal Graph-based RAG☆33Sep 16, 2025Updated 4 months ago
- Kubernetes Gateway API implementation in Rust☆22Updated this week
- A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.☆12May 17, 2025Updated 8 months ago
- Code for the VOST dataset☆26Oct 1, 2023Updated 2 years ago
- ☆25Jun 24, 2021Updated 4 years ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆210Oct 15, 2025Updated 4 months ago
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Jul 22, 2021Updated 4 years ago
- ☆14Feb 28, 2023Updated 2 years ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆45Sep 19, 2025Updated 4 months ago
- [NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.☆45Jan 28, 2026Updated 2 weeks ago
- ☆12Sep 6, 2023Updated 2 years ago
- The official implement of CTRNet++.☆14Dec 30, 2024Updated last year
- ☆19Dec 20, 2025Updated last month
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.