Variante / video-postproc-toolbox
针对新的视频后期工作流制作的各种小工具
☆20Updated 4 months ago
Alternatives and similar repositories for video-postproc-toolbox:
Users that are interested in video-postproc-toolbox are comparing it to the libraries listed below
- ☆70Updated 5 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆86Updated last year
- ☆130Updated 10 months ago
- Visual self-questioning for large vision-language assistant.☆41Updated 6 months ago
- HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model☆28Updated last month
- ☆99Updated 9 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆56Updated 8 months ago
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆38Updated 2 years ago
- ☆109Updated last year
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 8 months ago
- A collection of omni-mllm☆21Updated last week
- ☆82Updated 11 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆137Updated 9 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆48Updated 2 weeks ago
- ☆36Updated 9 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆86Updated 3 weeks ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆57Updated 4 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆33Updated 2 weeks ago
- [NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples☆51Updated 5 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆123Updated 5 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆76Updated 2 weeks ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆84Updated 8 months ago
- [NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)☆88Updated this week
- R1-Vision: Let's first take a look at the image☆46Updated 2 months ago
- ☆116Updated 10 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 7 months ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 2 months ago
- Official repository of Uni-AdaFocus (TPAMI 2024).☆41Updated 4 months ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆29Updated 10 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year