Variante / video-postproc-toolbox
针对新的视频后期工作流制作的各种小工具
☆18Updated 5 months ago
Related projects: ⓘ
- ☆104Updated 2 months ago
- Open source implementation of "Vision Transformers Need Registers"☆126Updated last week
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆85Updated 6 months ago
- The official implementation of "Adapter is All You Need for Tuning Visual Tasks".☆67Updated 3 weeks ago
- ☆106Updated 3 months ago
- ☆71Updated last year
- ☆100Updated last month
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆110Updated 8 months ago
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆53Updated 3 months ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 3 weeks ago
- ☆98Updated 6 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆80Updated 9 months ago
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"☆75Updated 6 months ago
- The official implementation of GrootVL: Tree Topology is All You Need in State Space Model☆58Updated 3 months ago
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆152Updated 2 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆67Updated last month
- ☆40Updated 10 months ago
- Visual self-questioning for large vision-language assistant.☆22Updated 3 weeks ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆72Updated 3 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- [ICCV2023] DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models☆150Updated 10 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆80Updated 2 weeks ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆181Updated 8 months ago
- The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆64Updated 3 months ago
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆160Updated 6 months ago
- ☆40Updated 3 months ago
- Official implementation for paper "Knowledge Diffusion for Distillation", NeurIPS 2023☆72Updated 7 months ago
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor☆92Updated 2 months ago