ModelTC/OmniBal

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ModelTC/OmniBal)

ModelTC / OmniBal

[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".

☆27

Alternatives and similar repositories for OmniBal

Users that are interested in OmniBal are comparing it to the libraries listed below

Sorting:

CnFaker / LLaVA-SP
View on GitHub
[ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".
☆22Oct 28, 2025Updated 4 months ago
ModelTC / EasyLLM
View on GitHub
Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …
☆49Sep 18, 2024Updated last year
ChangyaoTian / ADDP
View on GitHub
The official implementation of ADDP (ICLR 2024)
☆12Mar 27, 2024Updated last year
MikeWangWZHL / dymu
View on GitHub
☆24May 13, 2025Updated 9 months ago
dengandong / GroundMoRe
View on GitHub
☆16Apr 4, 2025Updated 10 months ago
GeWu-Lab / Stepping-Stones
View on GitHub
The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024
☆18Oct 11, 2024Updated last year
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆43Mar 11, 2025Updated 11 months ago
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
PKU-YuanGroup / LLMBind
View on GitHub
LLMBind: A Unified Modality-Task Integration Framework
☆19Jun 16, 2024Updated last year
sunsmarterjie / ChatterBox
View on GitHub
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆61May 2, 2025Updated 10 months ago
alibaba / conv-llava
View on GitHub
☆124Jul 29, 2024Updated last year
yqyao / DRFNet
View on GitHub
Dense Receptive Field For Object Detection
☆25Nov 26, 2018Updated 7 years ago
cv516Buaa / OV-VG
View on GitHub
☆32Mar 25, 2024Updated last year
Zhao-Jianing-SUDA / Hawkeye
View on GitHub
The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…
☆12Oct 14, 2024Updated last year
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated last year
Tanveer81 / ReVisionLLM
View on GitHub
This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
☆43Nov 5, 2025Updated 3 months ago
ChenhongyiYang / PlainMamba
View on GitHub
[BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition
☆86Apr 6, 2025Updated 10 months ago
AFeng-x / Draw-and-Understand
View on GitHub
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆94Dec 1, 2025Updated 3 months ago
VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆17Dec 9, 2025Updated 2 months ago
gabfstr / DiffusionTrack
View on GitHub
Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking
☆13Apr 12, 2023Updated 2 years ago
OssianEriksson / autonomous-twizy
View on GitHub
ROS packages for control of an autonomous Renault Twizy at the Department of Electrical Engineering, Chalmers University of Technology, S…
☆11May 30, 2021Updated 4 years ago
Sharpiless / Pix2seq-mmdetection
View on GitHub
Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection
☆33Apr 18, 2022Updated 3 years ago
Kami-code / HandsOnVLM-release
View on GitHub
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
☆41Sep 15, 2025Updated 5 months ago
OpenGVLab / Siamese-Image-Modeling
View on GitHub
[CVPR 2023]Implementation of Siamese Image Modeling for Self-Supervised Vision Representation Learning
☆41Jun 6, 2024Updated last year
PaParaZz1 / CariGANs
View on GitHub
pytorch implementation for CariGANS
☆33Mar 27, 2019Updated 6 years ago
opendilab / OpenPaL
View on GitHub
Building open-ended embodied agent in battle royale FPS game
☆38Feb 6, 2024Updated 2 years ago
jack1yang / image-paragraph-captioning
View on GitHub
A Hierarchical Approach for Generating Descriptive Image Paragraphs
☆10Mar 27, 2020Updated 5 years ago
iclr2024mcmi / ICLRMCMI
View on GitHub
Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
☆11Sep 28, 2023Updated 2 years ago
IRMV-Manipulation-Group / TF-HER
View on GitHub
Code for "Sample-efficient Deep Reinforcement Learning of Mobile Manipulation for 6-DOF Trajectory Following"
☆13Mar 19, 2025Updated 11 months ago
mlvlab / SpeaQ
View on GitHub
Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…
☆41Apr 19, 2024Updated last year
Lexsi-Labs / DLBacktrace
View on GitHub
DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.
☆23Feb 16, 2026Updated 2 weeks ago
mbzuai-oryx / TrackingMeetsLMM
View on GitHub
☆10Apr 7, 2025Updated 10 months ago
josephzpng / DisTime
View on GitHub
DisTime: Distribution-based Time Representation for Video Large Language Models.
☆18Jul 10, 2025Updated 7 months ago
weboccult-ai / onnx-model-zoo
View on GitHub
This GitHub repository contains converted models in ONNX, TensorRT, and PyTorch formats, along with inference scripts and demos. These mo…
☆14Aug 28, 2023Updated 2 years ago
kiva12138 / MIMRL
View on GitHub
The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning
☆18May 8, 2025Updated 9 months ago
xandery-geek / BadCM
View on GitHub
[IEEE TIP] Offical implementation for the work "BadCM: Invisible Backdoor Attack against Cross-Modal Learning".
☆14Aug 30, 2024Updated last year
LinfengYuan1997 / LoSh
View on GitHub
[CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
☆13Jun 17, 2024Updated last year
LinglingCai0314 / FreeMask
View on GitHub
☆11Jan 18, 2025Updated last year
YYJMJC / LOUPE
View on GitHub
☆45Aug 14, 2023Updated 2 years ago