zhongyy / VIoTGPTLinks
Code of AAAI2025 Paper 《VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things》
☆14Updated 5 months ago
Alternatives and similar repositories for VIoTGPT
Users that are interested in VIoTGPT are comparing it to the libraries listed below
Sorting:
- ☆38Updated last year
- Invariant Feature Regularization for Fair Face Recognition (ICCV'23)☆15Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆40Updated 9 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆20Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- Official implementation of TagAlign☆35Updated 6 months ago
- repo for paper titled: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment (AAAI'24 Oral)☆25Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆43Updated 2 months ago
- ☆19Updated last year
- ☆43Updated 2 years ago
- [NeurIPS2024 Oral] PyTorch implementation of DenoiseRep☆26Updated 2 weeks ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆28Updated 2 months ago
- [ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching☆79Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆18Updated last week
- [IJCV 2025] Code for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection☆51Updated 6 months ago
- [NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception☆43Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated last year
- Code and datasets of TPAMI 2022 paper《OPOM: Customized Invisible Cloak towards Face Privacy Protection》☆22Updated 3 years ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated last year
- Turning to Video for Transcript Sorting☆48Updated last year
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆30Updated 9 months ago
- ☆118Updated last year
- ☆117Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ☆52Updated last year
- [NeurIPS 2024] Lumen: a Large multimodal model with versatile vision-centric capabilities☆24Updated 9 months ago
- ☆16Updated 2 years ago
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆32Updated 2 months ago
- [CVPR2025] Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters☆37Updated 3 months ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆30Updated last year