Variante / video-postproc-toolboxLinks
针对新的视频后期工作流制作的各种小工具
☆19Updated last year
Alternatives and similar repositories for video-postproc-toolbox
Users that are interested in video-postproc-toolbox are comparing it to the libraries listed below
Sorting:
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆89Updated 2 years ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆253Updated last year
- ☆125Updated last year
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆120Updated last year
- ☆33Updated last year
- [COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation☆165Updated 6 months ago
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆47Updated 2 years ago
- Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in C…☆399Updated 3 weeks ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆33Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Updated 4 months ago
- A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.☆410Updated last year
- Collect the awesome works evolved around reasoning models like O1/R1 in visual domain☆51Updated 5 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆128Updated last year
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆259Updated last year
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆85Updated last week
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆212Updated last year
- 定时获取谷歌学术和arxiv论文的相关更新 (代码只有一个py文件,较简单有注释)☆72Updated last year
- 多模态 MM +Chat 合集☆280Updated 4 months ago
- [CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation☆136Updated 4 months ago
- Open source implementation of "Vision Transformers Need Registers"☆204Updated 2 months ago
- ☆120Updated last year
- [ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model☆341Updated last year
- ☆138Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆317Updated last year
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆246Updated last year
- Awesome List of Vision Language Prompt Papers☆47Updated 2 years ago
- ☆92Updated 2 years ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆290Updated last year
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆218Updated 9 months ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆205Updated last year