lzw-lzw / UnifiedMLLM
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆17Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for UnifiedMLLM
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆48Updated last month
- Making LLaVA Tiny via MoE-Knowledge Distillation☆55Updated 2 weeks ago
- ☆101Updated 4 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 3 months ago
- ☆29Updated 3 weeks ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆12Updated last month
- Official implementation of TagAlign☆32Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- LMM which strictly superset LLM embedded☆31Updated this week
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆11Updated 8 months ago
- ☆84Updated 11 months ago
- ☆19Updated last month
- ☆35Updated last month
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- Adapting LLaMA Decoder to Vision Transformer☆27Updated 5 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆40Updated last week
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 4 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆40Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuning☆23Updated this week
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆17Updated 2 months ago
- An official pytorch implementation of AAAI 2024 paper "Latent Space Editing in Transformer-based Flow Matching"☆27Updated 6 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆39Updated 3 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated last month