LAVIS - A One-stop Library for Language-Vision Intelligence
☆48Aug 5, 2024Updated last year
Alternatives and similar repositories for LAVIS-XInstructBLIP
Users that are interested in LAVIS-XInstructBLIP are comparing it to the libraries listed below
Sorting:
- Creative Instructions Project☆11Sep 4, 2023Updated 2 years ago
- ☆14Apr 25, 2025Updated 10 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 8 months ago
- Official implementation of T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition☆21Oct 23, 2024Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- Offical code repository of ”DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection“☆22Aug 22, 2024Updated last year
- 万物检测(零样本检测+识别) demo for SG2300X 【Recognize Anything + GroundingDINO】☆24May 9, 2024Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆57Jul 25, 2023Updated 2 years ago
- [IROS 2023] DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception☆32Nov 28, 2023Updated 2 years ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆34Mar 24, 2025Updated 11 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Language☆665Oct 22, 2024Updated last year
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆34Oct 16, 2024Updated last year
- Download flickr8k, flickr30k image caption datasets☆42Feb 6, 2024Updated 2 years ago
- The code repository of UniRL☆51May 30, 2025Updated 9 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)☆35Apr 23, 2024Updated last year
- ☆48Dec 13, 2024Updated last year
- windows端翻译软件。提供划词翻译、截图翻译、ai翻译等功能☆12Apr 24, 2025Updated 10 months ago
- ☆10Dec 8, 2025Updated 2 months ago
- A large-scale training and benchmarking framework for rPPG.☆10Nov 26, 2024Updated last year
- ☆10Oct 13, 2024Updated last year
- Work in Progress: A resource for people transitioning from ArcGIS to R for spatial stuff☆13Mar 22, 2021Updated 4 years ago
- ZJU-OPT (浙江大学光电学院) 机器视觉与图像处理课程project. This is a deep learning project trying to detect the defect on the silicon solar panel.☆12Apr 8, 2023Updated 2 years ago
- ☆14Jul 2, 2023Updated 2 years ago
- ☆17May 14, 2025Updated 9 months ago
- Resources for our IJCAI 2020 paper, TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact☆12Nov 30, 2020Updated 5 years ago
- (AAAI 2026) OSVBench, a new benchmark for evaluating Large Language Models (LLMs) in generating complete specification code pertaining to…☆13May 13, 2025Updated 9 months ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆255Feb 11, 2025Updated last year
- ☆38Jan 4, 2024Updated 2 years ago
- 【今日头条】文本作者身份识别比赛☆10Aug 20, 2018Updated 7 years ago
- Implementation of the paper "Exploiting Time-Frequency Conformers for Music Audio Enhancement"☆12Mar 21, 2025Updated 11 months ago
- Speech Separation☆10Jan 6, 2022Updated 4 years ago
- ☆14Nov 29, 2025Updated 3 months ago
- TensorFlow implementation of GANomaly (with MNIST dataset)☆10Dec 2, 2020Updated 5 years ago
- Sora 的中文指南🔥,Sora 中文调教指南,指令指南,应用开发指南,精选资源清单,Sora 开发者精选工具框架 🚀☆17Updated this week
- Sample and Computation Redistribution for Efficient Face Detection☆16May 13, 2024Updated last year
- Image dataset augmentation for machine learning☆14Jun 8, 2023Updated 2 years ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆13Jul 21, 2025Updated 7 months ago