Variante / video-postproc-toolbox
针对新的视频后期工作流制作的各种小工具
☆17Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for video-postproc-toolbox
- ☆113Updated 5 months ago
- ☆55Updated 3 weeks ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆94Updated 8 months ago
- Open source implementation of "Vision Transformers Need Registers"☆143Updated last week
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆71Updated 8 months ago
- The official implementation of "Adapter is All You Need for Tuning Visual Tasks".☆72Updated 2 months ago
- This repository includes the official implementation our paper "Scaling White-Box Transformers for Vision"☆45Updated 5 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆66Updated 5 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆217Updated 2 weeks ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆132Updated last month
- ☆48Updated 5 months ago
- ☆109Updated 5 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆135Updated 2 weeks ago
- ☆26Updated 7 months ago
- ☆90Updated 6 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆83Updated 11 months ago
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆132Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆89Updated 2 months ago
- PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.☆182Updated 5 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆148Updated last month
- ☆105Updated 3 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆35Updated 3 weeks ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆77Updated 5 months ago
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆57Updated last week
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆30Updated last month
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆179Updated 5 months ago