Variante / video-postproc-toolboxLinks
针对新的视频后期工作流制作的各种小工具
☆19Updated last year
Alternatives and similar repositories for video-postproc-toolbox
Users that are interested in video-postproc-toolbox are comparing it to the libraries listed below
Sorting:
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆33Updated last year
- [COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation☆166Updated 7 months ago
- [ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model☆342Updated last year
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆252Updated 2 years ago
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆122Updated last year
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆89Updated 2 years ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆390Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Updated 5 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- ☆138Updated last year
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆46Updated 2 years ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆293Updated last year
- ☆33Updated last year
- ☆125Updated last year
- This repository contains the implementation for the paper "EMP-SSL: Towards Self-Supervised Learning in One Training Epoch."☆227Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆90Updated 8 months ago
- [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆101Updated last year
- Collect the awesome works evolved around reasoning models like O1/R1 in visual domain☆53Updated 6 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆247Updated 2 years ago
- Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in C…☆404Updated last month
- VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.☆243Updated 3 weeks ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆79Updated last month
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆89Updated last month
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆227Updated 10 months ago
- Visual self-questioning for large vision-language assistant.☆45Updated 6 months ago
- ☆120Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆317Updated last year
- Official Implementation of the ECCV 2024 Paper: "CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts"☆54Updated 3 months ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Language☆668Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆252Updated 11 months ago