percent4 / yi_vl_experiment
本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。
☆13Updated last year
Alternatives and similar repositories for yi_vl_experiment:
Users that are interested in yi_vl_experiment are comparing it to the libraries listed below
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- 国内外数据竞赛资讯整理☆18Updated 3 years ago
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆41Updated last year
- Tensorflow implementation for Dash☆32Updated 2 years ago
- Music large model based on InternLM2-chat.☆22Updated 3 months ago
- Chinese CLIP models with SOTA performance.☆54Updated last year
- ☆25Updated 3 months ago
- Whisper in TensorRT-LLM☆15Updated last year
- 大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标☆16Updated 6 months ago
- Our 2nd-gen LMM☆33Updated 10 months ago
- 集成了LLM与SDXL的AIGC应用程序☆27Updated last year
- qwen2 and llama3 cpp implementation☆43Updated 9 months ago
- Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…☆22Updated 3 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆20Updated last year
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆49Updated last year
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆22Updated 8 months ago
- Code of AAAI2025 Paper 《VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things》☆11Updated 2 months ago
- Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架,通过文本…☆23Updated last year
- 从零到一实现了一个多模态大模型,并命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两层MLP投影层连…☆10Updated last month
- Taiyi-Diffusion-XL训练代码☆21Updated 9 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆13Updated 11 months ago
- ☆27Updated 10 months ago
- Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"☆12Updated 8 months ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆18Updated 11 months ago
- Xtuner Factory☆33Updated last year
- Building a VLM model starts from the basic module.☆14Updated 11 months ago