zhongyy / VIoTGPTLinks
Code of AAAI2025 Paper 《VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things》
☆14Updated 6 months ago
Alternatives and similar repositories for VIoTGPT
Users that are interested in VIoTGPT are comparing it to the libraries listed below
Sorting:
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆27Updated last year
- [IJCV 2025] Code for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection☆53Updated 7 months ago
- ☆38Updated last year
- ☆87Updated last year
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆53Updated 3 months ago
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆17Updated last year
- Video dataset dedicated to portrait-mode video recognition.☆52Updated 8 months ago
- Teach-DETR: Better Training DETR with Teachers☆31Updated last year
- ☆43Updated 2 years ago
- [ACM MM2025] The official repository for the RealSyn dataset☆36Updated last month
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆38Updated last year
- Our 2nd-gen LMM☆34Updated last year
- ☆34Updated last year
- [NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception☆43Updated last year
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- Benchmarking Attention Mechanism in Vision Transformers.☆18Updated 2 years ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆27Updated 3 weeks ago
- Official implementation of TagAlign☆35Updated 8 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆29Updated 2 months ago
- Large-batch Optimization for Dense Visual Predictions (NeurIPS 2022)☆57Updated 2 years ago
- Masked Vision-Language Transformer in Fashion☆35Updated last year
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 7 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- ☆19Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆47Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- Turning to Video for Transcript Sorting☆48Updated last year
- [CBMI2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".☆28Updated 3 months ago
- ☆19Updated 2 years ago