TencentARC / mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
☆90Updated 8 months ago
Alternatives and similar repositories for mllm-npu:
Users that are interested in mllm-npu are comparing it to the libraries listed below
- ☆164Updated 3 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆147Updated 6 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆189Updated 2 weeks ago
- A parallelism VAE avoids OOM for high resolution image generation☆61Updated 3 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆47Updated 7 months ago
- Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆194Updated last month
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆306Updated last month
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆142Updated last month
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆121Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- A sparse attention kernel supporting mix sparse patterns☆202Updated 2 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆102Updated 9 months ago
- Pruning the VLLMs☆92Updated 5 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆313Updated 2 weeks ago
- [ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity☆191Updated last week
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training☆338Updated this week
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆223Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆201Updated 4 months ago
- 📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉☆218Updated 2 weeks ago
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 8 months ago
- 📚 Collection of awesome generation acceleration resources.☆225Updated 2 weeks ago
- ☆115Updated 9 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆80Updated last month
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆490Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆163Updated this week
- ☆143Updated 3 months ago
- Efficient Mixture of Experts for LLM Paper List☆64Updated 4 months ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆46Updated last month
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆73Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago