ByteDance-Seed / DeepFlowLinks
[ICCV 2025] Deeply Supervised Flow-Based Generative Models
☆24Updated last month
Alternatives and similar repositories for DeepFlow
Users that are interested in DeepFlow are comparing it to the libraries listed below
Sorting:
- ☆24Updated last month
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆35Updated last year
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆108Updated 3 weeks ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆82Updated 2 weeks ago
- ☆75Updated 5 months ago
- ☆29Updated 11 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆20Updated last week
- ☆112Updated last week
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 10 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆125Updated 9 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 5 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- ☆96Updated this week
- ☆34Updated 6 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆45Updated 2 weeks ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆15Updated 9 months ago
- ☆68Updated last month
- Quick Long Video Understanding☆60Updated last month
- ☆51Updated last month
- ☆73Updated last year
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆20Updated last week
- [ACL2025 Oral] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆87Updated last month
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆47Updated 2 weeks ago
- ☆90Updated 5 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆126Updated last month
- Our 2nd-gen LMM☆34Updated last year
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆58Updated 2 weeks ago
- Scaling Preference Data Curation via Human-AI Synergy☆95Updated last month
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆73Updated last month
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆33Updated 2 months ago