360CVGroup / 360VLLinks
Our 2nd-gen LMM
☆33Updated last year
Alternatives and similar repositories for 360VL
Users that are interested in 360VL are comparing it to the libraries listed below
Sorting:
- ☆29Updated 9 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- Chinese CLIP models with SOTA performance.☆55Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- ☆56Updated last year
- Video dataset dedicated to portrait-mode video recognition.☆49Updated 5 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆92Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 7 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- Precision Search through Multi-Style Inputs☆69Updated last month
- ☆68Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆34Updated 11 months ago
- ☆75Updated 2 months ago
- ☆73Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated 11 months ago
- ☆17Updated last year
- ☆21Updated last week
- Research Code for Multimodal-Cognition Team in Ant Group☆146Updated 2 weeks ago
- 中文原生文生图测评基准☆9Updated 10 months ago
- ☆79Updated last year
- 基于baichuan-7b的开源多模态大语言模型☆73Updated last year
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆32Updated last week
- ☆28Updated last year
- ☆36Updated 8 months ago
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆33Updated 2 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆120Updated 6 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year