360CVGroup / 360VL
Our 2nd-gen LMM
☆33Updated 9 months ago
Alternatives and similar repositories for 360VL:
Users that are interested in 360VL are comparing it to the libraries listed below
- ☆29Updated 7 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 6 months ago
- LMM solved catastrophic forgetting, AAAI2025☆39Updated 4 months ago
- ☆17Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Chinese CLIP models with SOTA performance.☆52Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 5 months ago
- ☆56Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- ☆73Updated last year
- Video dataset dedicated to portrait-mode video recognition.☆44Updated 3 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆117Updated 4 months ago
- ☆27Updated 10 months ago
- Precision Search through Multi-Style Inputs☆64Updated 7 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- ☆86Updated 8 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆118Updated 4 months ago
- ☆67Updated last year
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆48Updated last year
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 6 months ago
- ☆78Updated 10 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 6 months ago
- ☆80Updated 10 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 8 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆75Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 7 months ago