MonolithFoundation / BumblebeeLinks

A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.

☆38

Alternatives and similar repositories for Bumblebee

Users that are interested in Bumblebee are comparing it to the libraries listed below

Sorting:

360CVGroup / 360VL
Our 2nd-gen LMM
☆34Updated last year
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated last year
Ucas-HaoranWei / Vary-family
☆57Updated last year
TencentARC-QQ / QA-CLIP
Chinese CLIP models with SOTA performance.
☆58Updated 2 years ago
xverse-ai / XVERSE-V-13B
☆79Updated last year
360CVGroup / SEEChat
Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM
☆101Updated last year
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆252Updated 2 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆63Updated 11 months ago
360AILABNLP / 360LayoutAnalysis
☆27Updated last year
Ucas-HaoranWei / Vary-tiny-600k
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆86Updated last year
zai-org / GLM-Edge
GLM Series Edge Models
☆149Updated 4 months ago
WePOINTS / WePOINTS
☆186Updated 8 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆126Updated 11 months ago
opendatalab / image-downloader
☆28Updated last year
pleisto / yuren-baichuan-7b
基于baichuan-7b的开源多模态大语言模型
☆72Updated last year
Token-family / TokenFD
[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding
☆121Updated last month
ucaslcl / Fox
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆155Updated last year
WalkerMitty / PDFparser
Here is a demo for PDF parser (Including OCR, object detection tools)
☆36Updated last year
vaew / SkyScript-100M
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2
☆127Updated 11 months ago
will-singularity / Skywork-MM
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Updated last year
yujunhuics / Reyes
从零到一实现了一个多模态大模型，并命名为Reyes（睿视），R：睿，eyes：眼。Reyes的参数量为8B，视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct，Reyes也通过一个两层MLP投影层连…
☆26Updated 8 months ago
rednote-hilab / dots.vlm1
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
☆260Updated 3 weeks ago
bytedance / MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆63Updated 5 months ago
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated last year
large-ocr-model / large-ocr-model.github.io
☆183Updated last year
kq-chen / qwen-vl-utils
helper functions for processing and integrating visual language information with Qwen-VL Series Model
☆15Updated last year
StarRing2022 / R1-Nature
最简易的R1结果在小模型上的复现，阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证，对于强推理能力，think思考过程性内容是AGI/ASI的核心。
☆45Updated 8 months ago
thu-ml / zh-clip
☆72Updated 2 years ago
xverse-ai / XVERSE-MoE-A4.2B
XVERSE-MoE-A4.2B: A multilingual large language model developed by XVERSE Technology Inc.
☆39Updated last year
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆98Updated 4 months ago