ByteDance-Seed / DeepFlowLinks
[ICCV 2025] Deeply Supervised Flow-Based Generative Models
☆24Updated 2 months ago
Alternatives and similar repositories for DeepFlow
Users that are interested in DeepFlow are comparing it to the libraries listed below
Sorting:
- ☆123Updated last month
- ☆27Updated 2 weeks ago
- ☆78Updated 5 months ago
- ☆92Updated 6 months ago
- ☆77Updated 4 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆122Updated last month
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆49Updated last month
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆124Updated 9 months ago
- ☆29Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated 11 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆142Updated 10 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 9 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆84Updated last month
- ☆128Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆61Updated 6 months ago
- ☆55Updated last month
- ☆54Updated this week
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆49Updated 11 months ago
- ☆34Updated 7 months ago
- ☆27Updated last week
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- Quick Long Video Understanding☆62Updated 2 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 6 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆47Updated last month
- ☆74Updated last year
- ☆17Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- ☆80Updated 5 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆24Updated last month
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆25Updated last week