Variante / video-postproc-toolboxLinks
针对新的视频后期工作流制作的各种小工具
☆20Updated 5 months ago
Alternatives and similar repositories for video-postproc-toolbox
Users that are interested in video-postproc-toolbox are comparing it to the libraries listed below
Sorting:
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆47Updated 2 weeks ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 3 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆86Updated last year
- ☆105Updated 11 months ago
- Collection of papers and repos for multimodal chain-of-thought☆83Updated 7 months ago
- Visual self-questioning for large vision-language assistant.☆41Updated 8 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆209Updated 2 months ago
- ☆76Updated 7 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- ☆133Updated 11 months ago
- https://www.shoufachen.com/Awesome-Diffusion-Transformers/☆142Updated last year
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆38Updated 2 years ago
- A Survey on Benchmarks of Multimodal Large Language Models☆104Updated 2 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆102Updated 9 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 10 months ago
- ☆115Updated 10 months ago
- Official implementation for paper "Knowledge Diffusion for Distillation", NeurIPS 2023☆86Updated last year
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆176Updated 4 months ago
- Keras implement of Finite Scalar Quantization☆73Updated last year
- ☆31Updated 5 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 8 months ago
- HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model☆40Updated 2 weeks ago
- ☆109Updated last year
- ☆117Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆65Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- A collection of visual instruction tuning datasets.☆76Updated last year
- [CVPR2024] ModaVerse: Efficiently Transforming Modalities with LLMs☆29Updated 11 months ago
- ☆87Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 8 months ago