mightyzau/InfMLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mightyzau/InfMLLM)

mightyzau / InfMLLM

☆19

Alternatives and similar repositories for InfMLLM

Users that are interested in InfMLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆16Jun 20, 2023Updated 3 years ago
buptlihang / CVLM
View on GitHub
☆23Jan 8, 2024Updated 2 years ago
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
UniAdapter / UniAdapter
View on GitHub
☆28Mar 20, 2023Updated 3 years ago
mightyzau / RegionBLIP
View on GitHub
☆59Aug 7, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
xing0047 / rewrite
View on GitHub
[NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
☆21Jan 3, 2024Updated 2 years ago
opendatalab / image-downloader
View on GitHub
☆31May 13, 2024Updated 2 years ago
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
V3Det / mmdetection-V3Det
View on GitHub
OpenMMLab Detection Toolbox and Benchmark for V3Det
☆15Apr 3, 2024Updated 2 years ago
CASIA-LMC-Lab / Obj2Seq
View on GitHub
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)
☆85Nov 2, 2022Updated 3 years ago
PCIResearch / TransCore-M
View on GitHub
Large Multimodal Model
☆15Apr 8, 2024Updated 2 years ago
wwwfan628 / DA-AIM
View on GitHub
DA-AIM: Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection
☆12Oct 6, 2022Updated 3 years ago
SimarKareer / UnifiedVideoDA
View on GitHub
We're Not Using Videos Effectively (TMLR 2024)
☆17Feb 4, 2024Updated 2 years ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ThomasMrY / VCT
View on GitHub
[NeurIPS 2022] code for "Visual Concepts Tokenization"
☆23Oct 10, 2022Updated 3 years ago
alibaba / conv-llava
View on GitHub
☆128Jul 29, 2024Updated last year
Paranioar / UniPT
View on GitHub
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆71Oct 15, 2024Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
QUVA-Lab / PIN
View on GitHub
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Jan 14, 2025Updated last year
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
mshukor / EvALign-ICL
View on GitHub
[ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …
☆22Mar 1, 2024Updated 2 years ago
hoangtuanvu / conformer_ocr
View on GitHub
Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This…
☆10Dec 27, 2021Updated 4 years ago
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Liac-li / MM-self-improve-qwen2vl
View on GitHub
☆13Dec 9, 2024Updated last year
HenryHZY / VL-PET
View on GitHub
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆53Sep 21, 2023Updated 2 years ago
ModelTC / OmniBal
View on GitHub
[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniv…
☆27Jun 16, 2025Updated last year
palchenli / VL-Instruction-Tuning
View on GitHub
☆90Nov 25, 2023Updated 2 years ago
mlpc-ucsd / MasQCLIP
View on GitHub
(ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation
☆37Oct 18, 2023Updated 2 years ago
impiga / Plain-DETR
View on GitHub
[ICCV2023] DETR Doesn’t Need Multi-Scale or Locality Design
☆232Nov 14, 2023Updated 2 years ago
bytedance / OmniScient-Model
View on GitHub
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆102Jul 15, 2024Updated 2 years ago
rshaojimmy / DeepFake-Adapter
View on GitHub
[IJCV 2025] Code for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
☆63Dec 24, 2024Updated last year
ChenDelong1999 / polite-flamingo
View on GitHub
🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)
☆65Dec 9, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
hyperrixel / infinitybatch
View on GitHub
PyTorch tool for training with bigger batch size on the GPU
☆11Feb 26, 2021Updated 5 years ago
UMass-Embodied-AGI / CoVLM
View on GitHub
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆46Jun 9, 2025Updated last year
OpenGVLab / LCL
View on GitHub
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆72Feb 11, 2025Updated last year
amazon-science / prompt-pretraining
View on GitHub
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆259May 3, 2024Updated 2 years ago
mynameischaos / Lion
View on GitHub
Lion: Kindling Vision Intelligence within Large Language Models
☆51Jan 25, 2024Updated 2 years ago
bajibabu / GlottGAN
View on GitHub
This repository contains the files used for our Interspeech 2017 paper.
☆16May 30, 2017Updated 9 years ago
OpenGVLab / Mono-InternVL
View on GitHub
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆109Jul 18, 2025Updated last year