OpenGVLab / FluxViT
Make Your Training Flexible: Towards Deployment-Efficient Video Models
☆17Updated last week
Alternatives and similar repositories for FluxViT:
Users that are interested in FluxViT are comparing it to the libraries listed below
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆19Updated 5 months ago
- (CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA…☆20Updated last month
- Official Implementation of KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models☆38Updated 5 months ago
- PyTorch implementation for our ICLR 2025 paper State Space Model Meets Transformer: A New Paradigm for 3D Object Detection☆19Updated this week
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆50Updated this week
- An open source implementation of CLIP (With TULIP Support)☆113Updated last week
- [NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Underst…☆22Updated 2 weeks ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆29Updated 4 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆45Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆124Updated 2 months ago
- The official implementation of Preference Data Reward-Augmentation.☆17Updated 5 months ago
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks☆47Updated 6 months ago
- TensorFlow code for our ECCV'24 Workshop paper "LightAvatar: Efficient Head Avatar as Dynamic NeLF"☆28Updated 4 months ago
- ☆70Updated 3 weeks ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆97Updated last week
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆75Updated 6 months ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆21Updated 5 months ago
- [Arxiv] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆12Updated 3 weeks ago
- [arXiv'25] Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"☆16Updated 2 months ago
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆30Updated 7 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 9 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆66Updated 6 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 3 weeks ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆140Updated 3 weeks ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- ☆26Updated 3 weeks ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- ☆37Updated 5 months ago