ByteDance-Seed / DeepFlowLinks
[ICCV 2025] Deeply Supervised Flow-Based Generative Models
☆27Updated 5 months ago
Alternatives and similar repositories for DeepFlow
Users that are interested in DeepFlow are comparing it to the libraries listed below
Sorting:
- ☆138Updated 4 months ago
- ☆28Updated 3 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆136Updated 4 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆118Updated last month
- A collection of strong multimodal models for building multimodal AGI agents☆43Updated last year
- ☆50Updated 5 months ago
- ☆80Updated 8 months ago
- ☆74Updated 5 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆128Updated last year
- ☆78Updated 6 months ago
- ☆62Updated 4 months ago
- DELT: Data Efficacy for Language Model Training☆42Updated 3 months ago
- ☆129Updated 5 months ago
- ☆93Updated 8 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- ☆29Updated last year
- Official Implementation of APB (ACL 2025 main Oral)☆31Updated 9 months ago
- ☆186Updated 9 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆76Updated 2 months ago
- ☆61Updated 2 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆37Updated last year
- Quick Long Video Understanding☆69Updated last month
- ☆17Updated 2 years ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆127Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 9 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Updated 9 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆124Updated 2 months ago
- [EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation☆115Updated last week