baidubce / Qianfan-VLLinks
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
☆158Updated 3 weeks ago
Alternatives and similar repositories for Qianfan-VL
Users that are interested in Qianfan-VL are comparing it to the libraries listed below
Sorting:
- Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…☆377Updated last week
- ☆186Updated 8 months ago
- ☆92Updated 3 weeks ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆226Updated 4 months ago
- a toolkit on knowledge distillation for large language models☆171Updated last week
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆260Updated 3 weeks ago
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆218Updated 4 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆93Updated 2 months ago
- PDF解析工具:GOT的vLLM加速实现,MinerU做布局识别裁剪、GOT做表格公式解析,实现RAG中的pdf解析☆63Updated 11 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆198Updated 3 weeks ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆63Updated 11 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆252Updated 2 months ago
- OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…☆232Updated last month
- Max的有趣数据集 / Max's awesome datasets☆49Updated last month
- a-m-team's exploration in large language modeling☆189Updated 4 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆155Updated last year
- Ling is a MoE LLM provided and open-sourced by InclusionAI.☆215Updated 5 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆332Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆383Updated 5 months ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆228Updated 6 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆167Updated last week
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆126Updated 4 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆57Updated 5 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆178Updated 2 years ago
- ☆296Updated 4 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆396Updated 5 months ago
- ☆68Updated 2 months ago
- SUS-Chat: Instruction tuning done right☆49Updated last year
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆389Updated last week
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆318Updated 4 months ago