ByteDance-Seed / DeepFlowLinks
[ICCV 2025] Deeply Supervised Flow-Based Generative Models
☆27Updated 4 months ago
Alternatives and similar repositories for DeepFlow
Users that are interested in DeepFlow are comparing it to the libraries listed below
Sorting:
- ☆135Updated 3 months ago
- ☆28Updated 2 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆131Updated 3 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆37Updated last year
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆90Updated 3 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 11 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆127Updated 11 months ago
- ☆92Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- 🕹️ Explore cutting-edge techniques in game generation☆49Updated 2 months ago
- ☆78Updated 7 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆26Updated last month
- [EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation☆107Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- ☆73Updated 4 months ago
- ☆78Updated 5 months ago
- ☆129Updated 4 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 3 months ago
- ☆29Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 8 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆98Updated last week
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆33Updated 2 months ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆15Updated last year
- ☆50Updated 4 months ago
- ☆186Updated 8 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024☆64Updated 2 weeks ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆142Updated last year
- ☆17Updated 2 years ago
- 😊 TPTT: Transforming Pretrained Transformers into Titans☆29Updated 2 weeks ago