[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".
☆27Jun 16, 2025Updated 8 months ago
Alternatives and similar repositories for OmniBal
Users that are interested in OmniBal are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆22Oct 28, 2025Updated 4 months ago
- Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …☆49Sep 18, 2024Updated last year
- The official implementation of ADDP (ICLR 2024)☆12Mar 27, 2024Updated last year
- ☆24May 13, 2025Updated 9 months ago
- ☆16Apr 4, 2025Updated 10 months ago
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆18Oct 11, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- ☆19Dec 6, 2023Updated 2 years ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆61May 2, 2025Updated 10 months ago
- ☆124Jul 29, 2024Updated last year
- Dense Receptive Field For Object Detection☆25Nov 26, 2018Updated 7 years ago
- ☆32Mar 25, 2024Updated last year
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆43Nov 5, 2025Updated 3 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆86Apr 6, 2025Updated 10 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆94Dec 1, 2025Updated 3 months ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- ROS packages for control of an autonomous Renault Twizy at the Department of Electrical Engineering, Chalmers University of Technology, S…☆11May 30, 2021Updated 4 years ago
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆33Apr 18, 2022Updated 3 years ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆41Sep 15, 2025Updated 5 months ago
- [CVPR 2023]Implementation of Siamese Image Modeling for Self-Supervised Vision Representation Learning☆41Jun 6, 2024Updated last year
- pytorch implementation for CariGANS☆33Mar 27, 2019Updated 6 years ago
- Building open-ended embodied agent in battle royale FPS game☆38Feb 6, 2024Updated 2 years ago
- A Hierarchical Approach for Generating Descriptive Image Paragraphs☆10Mar 27, 2020Updated 5 years ago
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆11Sep 28, 2023Updated 2 years ago
- Code for "Sample-efficient Deep Reinforcement Learning of Mobile Manipulation for 6-DOF Trajectory Following"☆13Mar 19, 2025Updated 11 months ago
- Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…☆41Apr 19, 2024Updated last year
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated 2 weeks ago
- ☆10Apr 7, 2025Updated 10 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- This GitHub repository contains converted models in ONNX, TensorRT, and PyTorch formats, along with inference scripts and demos. These mo…☆14Aug 28, 2023Updated 2 years ago
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆18May 8, 2025Updated 9 months ago
- [IEEE TIP] Offical implementation for the work "BadCM: Invisible Backdoor Attack against Cross-Modal Learning".☆14Aug 30, 2024Updated last year
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆11Jan 18, 2025Updated last year
- ☆45Aug 14, 2023Updated 2 years ago