qiujihao19 / Artemis
☆20Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Artemis
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆54Updated 2 months ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆117Updated 10 months ago
- The paper collections for the autoregressive models in vision.☆95Updated this week
- This is a repo to track the latest autoregressive visual generation papers.☆41Updated 3 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆77Updated 7 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆56Updated 2 weeks ago
- LLMBind: A Unified Modality-Task Integration Framework☆15Updated 4 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆53Updated 3 weeks ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆76Updated 3 weeks ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆37Updated 3 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆41Updated last week
- ☆72Updated 5 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆65Updated 3 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆115Updated last month
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆227Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆133Updated 3 weeks ago
- Official repository of MMDU dataset☆74Updated last month
- Official implement of MIA-DPO☆32Updated this week
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆55Updated this week
- Making LLaVA Tiny via MoE-Knowledge Distillation☆55Updated 2 weeks ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆48Updated last month
- ☆47Updated last week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆31Updated last month
- ☆21Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆105Updated last week
- ☆119Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆128Updated 2 months ago