HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
☆689Updated 3 weeks ago
Alternatives and similar repositories for UMOE-Scaling-Unified-Multimodal-LLMs:
Users that are interested in UMOE-Scaling-Unified-Multimodal-LLMs are comparing it to the libraries listed below
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆618Updated this week
- Align Anything: Training All-modality Model with Feedback☆2,154Updated this week
- Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.☆2,116Updated this week
- Build multimodal language agents for fast prototype and production☆1,777Updated this week
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆885Updated 3 weeks ago
- 【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models☆1,708Updated this week
- An MBTI Exploration of Large Language Models☆457Updated last year
- [IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation☆1,004Updated 3 months ago
- [NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,039Updated 4 months ago
- ☆1,381Updated 4 months ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆127Updated last month
- [ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"☆460Updated 9 months ago
- Official repository of MMGenBench☆119Updated 3 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆253Updated 10 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆96Updated 7 months ago
- Improving Generalist Model with Domain-Specific Experts☆82Updated last month
- Unified KV Cache Compression Methods for Auto-Regressive Models☆886Updated last month
- SDG is a specialized framework designed to generate high-quality structured tabular data.☆2,308Updated last week
- ☆158Updated 4 months ago
- SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling☆1,026Updated last month
- Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs☆148Updated 4 months ago
- Real-time and accurate open-vocabulary end-to-end object detection☆1,167Updated 2 months ago
- [NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy☆61Updated 3 weeks ago
- OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]☆1,231Updated 2 months ago
- Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"☆193Updated 6 months ago
- Video generation from text&image, 1st-gen☆745Updated last week
- PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.☆873Updated 2 months ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆196Updated 7 months ago