Variante / video-postproc-toolbox
针对新的视频后期工作流制作的各种小工具
☆20Updated 4 months ago
Alternatives and similar repositories for video-postproc-toolbox
Users that are interested in video-postproc-toolbox are comparing it to the libraries listed below
Sorting:
- ☆74Updated 6 months ago
- ☆109Updated last year
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 3 months ago
- ☆86Updated 2 years ago
- MLLM @ Game☆14Updated last week
- Visual self-questioning for large vision-language assistant.☆41Updated 7 months ago
- HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model☆34Updated last month
- ☆41Updated 4 months ago
- Adapting LLaMA Decoder to Vision Transformer☆28Updated 11 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆98Updated 8 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 2 months ago
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆36Updated 2 weeks ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆86Updated last year
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆34Updated last month
- ☆116Updated 11 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆104Updated 2 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- ☆132Updated 10 months ago
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆38Updated 2 years ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆137Updated 6 months ago
- ☆75Updated 4 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆36Updated 3 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- ☆84Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 9 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆140Updated 10 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆51Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 7 months ago