bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆147Updated last week
Alternatives and similar repositories for Valley:
Users that are interested in Valley are comparing it to the libraries listed below
- ☆160Updated 3 weeks ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆129Updated 7 months ago
- GLM Series Edge Models☆118Updated last week
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought☆188Updated last week
- ☆78Updated 8 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆105Updated 2 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆94Updated this week
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆133Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆197Updated last month
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆127Updated 6 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆134Updated 2 months ago
- ☆189Updated 3 weeks ago
- ☆167Updated 6 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 4 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆151Updated 5 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆89Updated 4 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆108Updated last month
- ☆32Updated 7 months ago
- Multimodal Models in Real World☆425Updated 2 months ago
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆52Updated last month
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆133Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆209Updated 9 months ago
- ☆36Updated 2 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆107Updated last month
- 我们是第一个完全可商用的角色大模型。☆37Updated 4 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆255Updated 2 months ago
- 🔥🔥First-ever hour scale video understanding models☆218Updated 2 weeks ago
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆58Updated 6 months ago
- ☆336Updated 2 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆117Updated last month