TencentARC / mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
☆83Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for mllm-npu
- ☆100Updated last month
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆137Updated 3 weeks ago
- 📒A small curated list of Awesome Diffusion Inference Papers with codes.☆96Updated this week
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆197Updated 7 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆41Updated 2 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆91Updated 2 weeks ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆274Updated 3 months ago
- ☆68Updated last week
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆69Updated last week
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆231Updated last month
- ☆105Updated 3 months ago
- 🔥🔥First-ever hour scale video understanding models☆166Updated 3 weeks ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 7 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆40Updated last month
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆127Updated 5 months ago
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆274Updated this week
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale