Visual Generation Tuning
☆99Jan 27, 2026Updated last month
Alternatives and similar repositories for VGT
Users that are interested in VGT are comparing it to the libraries listed below
Sorting:
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Jul 5, 2025Updated 8 months ago
- [ICCV 2025] FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models☆19Aug 26, 2025Updated 6 months ago
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆445Dec 16, 2025Updated 2 months ago
- ThinkGen: Generalized Thinking for Visual Generation☆51Dec 30, 2025Updated 2 months ago
- ☆37Oct 17, 2025Updated 4 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40May 26, 2025Updated 9 months ago
- Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization☆61Sep 19, 2025Updated 5 months ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- Official Implementation of MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models☆12Nov 1, 2025Updated 4 months ago
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- Open Set Semantic Segmentation☆10Dec 23, 2020Updated 5 years ago
- [ACM MM 2023] The released code of paper "Deconfounded Visual Question Generation with Causal Inference"☆11Sep 3, 2024Updated last year
- Code for the AAAI 2024 paper: "AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack" (accepted).☆12Mar 28, 2024Updated last year
- ☆13Nov 5, 2024Updated last year
- [CVPR 2025] Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation☆19Dec 18, 2025Updated 2 months ago
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology☆12Jun 17, 2025Updated 8 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- [Neurips 2024] This repository is the official implementation of the Spatio-hemispherical equivariant convolution for dMRI deconvolution …☆10Dec 24, 2024Updated last year
- [ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"☆13Aug 6, 2024Updated last year
- ☆15Feb 11, 2025Updated last year
- This repository collects awesome representative papers and resources for "From Pre-training to Post-training: A Survey on Time Series Fou…☆31Feb 1, 2026Updated last month
- Optimizable stack of images at different resolutions, a useful representation of images for deep learning tasks. Docs: https://johnowhita…☆11Sep 8, 2022Updated 3 years ago
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆15Aug 6, 2025Updated 6 months ago
- Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions☆22Feb 11, 2026Updated 3 weeks ago
- Antonino Furnari's fork of Feichtenhofer's gpu_flow, with temporal dilation.☆10Sep 18, 2020Updated 5 years ago
- [ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensi…☆20Jun 12, 2025Updated 8 months ago
- Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion☆45Aug 1, 2024Updated last year
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- FunASR安卓端侧离线版本2pass全模式☆14Sep 4, 2023Updated 2 years ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated 2 months ago
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆11Aug 11, 2025Updated 6 months ago
- VideoGPA is a self-supervised framework that enhances 3D consistency in Video Diffusion Models.☆35Feb 3, 2026Updated last month
- official PyTorch implementation of paper "Adversarial Bipartite Graph Learning for Video Domain Adaptation" (MM2020 Oral)☆11Jun 16, 2022Updated 3 years ago
- ☆18Apr 7, 2025Updated 10 months ago
- ☆12Dec 4, 2024Updated last year
- Code release for "MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos"(CVPR2023)☆14Dec 14, 2023Updated 2 years ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Nov 7, 2024Updated last year
- ☆28Jan 11, 2026Updated last month