mrseanryan / finetune_LLaVA
Fine tune LLaVA 1.5 - based on article by wandb
☆12Updated last year
Alternatives and similar repositories for finetune_LLaVA:
Users that are interested in finetune_LLaVA are comparing it to the libraries listed below
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆35Updated 3 months ago
- [ECAI 2023] MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient☆30Updated last year
- Official PyTorch implementation of “MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation”☆16Updated 5 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated last month
- ☆37Updated 2 months ago
- 【IEEE T-IV】A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆50Updated 11 months ago
- ☆20Updated last year
- Code for CVPR2025 "MMRL: Multi-Modal Representation Learning for Vision-Language Models".☆32Updated last month
- Official PyTorch Implementation for "Stereo3DMOT: Stereo Vision Based 3D Multi-Object Tracking with Multimodal ReID, PRCV2023"☆21Updated 10 months ago
- ☆34Updated last week
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆90Updated last year
- Taming Self-Training for Open-Vocabulary Object Detection, CVPR 2024☆21Updated last year
- Public repository for the ECCV 2024 paper "Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation".☆23Updated 7 months ago
- ☆24Updated last year
- ☆47Updated 10 months ago
- [NeurIPS 2023] HASSOD: Hierarchical Adaptive Self-Supervised Object Detection☆56Updated last year
- [IJCV 2025] MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning☆62Updated last year
- A question bank for interview questions for data related roles☆10Updated last year
- [CVPR 2024] Official PyTorch Code of SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects☆99Updated 2 weeks ago
- [CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding☆136Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 6 months ago
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆18Updated 9 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆23Updated last week
- ☆40Updated 2 weeks ago
- A dual-branch conditional diffusion model designed to enhance driving scene generation across multiple views and video sequences.☆24Updated this week
- arxiv-daily☆79Updated 3 years ago
- ☆68Updated 10 months ago
- Official Implementation of DINO-Foresight: Looking into the Future with DINO☆50Updated 2 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆37Updated 7 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆66Updated last year