OpenGVLab / PVC
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
β19Updated 3 weeks ago
Alternatives and similar repositories for PVC:
Users that are interested in PVC are comparing it to the libraries listed below
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β30Updated 6 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β22Updated 2 weeks ago
- β17Updated this week
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"β45Updated 2 months ago
- FQGAN: Factorized Visual Tokenization and Generationβ39Updated this week
- β37Updated last year
- Open implementation of "RandAR"β46Updated last week
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β55Updated this week
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"β27Updated last month
- β33Updated 2 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLMβ18Updated 3 weeks ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.β58Updated 3 months ago
- β58Updated last year
- β26Updated 5 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generationβ38Updated last month
- β42Updated last week
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ40Updated last week
- Retrieval-Augmented Personalizationβ11Updated last month
- Liquid: Language Models are Scalable Multi-modal Generatorsβ57Updated 3 weeks ago
- β22Updated last month
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ32Updated 7 months ago
- β16Updated last year
- Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLMβ46Updated 3 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β34Updated last month
- A collection of vision foundation models unifying understanding and generation.β32Updated last week
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editingβ18Updated last week
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"β20Updated last year