mbzuai-oryx / PALOLinks

(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

☆84

Alternatives and similar repositories for PALO

Users that are interested in PALO are comparing it to the libraries listed below

Sorting:

neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆110Updated last month
gregor-ge / mBLIP
☆86Updated last year
mbzuai-oryx / ALM-Bench
[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the…
☆43Updated 2 months ago
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆94Updated 7 months ago
LAION-AI / General-GPT
☆65Updated last year
mfarre / Video-LLaVA-7B-hf-CinePile
Video-LlaVA fine-tune for CinePile evaluation
☆51Updated last year
prometheus-eval / prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…
☆74Updated 10 months ago
XiaoduoAILab / XmodelVLM
☆69Updated last year
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 5 months ago
ByungKwanLee / TroL
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆97Updated last year
google / imageinwords
Data release for the ImageInWords (IIW) paper.
☆216Updated 8 months ago
togethercomputer / Dragonfly
☆77Updated 9 months ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆152Updated last year
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆246Updated 6 months ago
CERC-AAI / Robin
☆63Updated 10 months ago
ByungKwanLee / Phantom
[Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …
☆60Updated 10 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 2 weeks ago
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆138Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 6 months ago
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆186Updated 11 months ago
Qichuzyy / POA
Official implementation of ECCV24 paper: POA
☆24Updated last year
FudanNLPLAB / MouSi
☆73Updated last year
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆52Updated 7 months ago
Victorwz / LLaVA-Llama-3
Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.
☆66Updated 9 months ago
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆262Updated 7 months ago
apple / ml-rpm-bench
☆41Updated last year
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆209Updated 7 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆117Updated this week
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆22Updated last week
aimagelab / LLaVA-MORE
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
☆145Updated this week