ContextualAI / lensLinks

This is the official repository for the LENS (Large Language Models Enhanced to See) system.

☆352

Alternatives and similar repositories for lens

Users that are interested in lens are comparing it to the libraries listed below

Sorting:

kohjingyu / gill
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆459Updated last year
facebookresearch / LaViLa
Code release for "Learning Video Representations from Large Language Models"
☆528Updated last year
mlfoundations / datacomp
DataComp: In search of the next generation of multimodal datasets
☆729Updated 3 months ago
kohjingyu / fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
☆482Updated last year
kyegomez / CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …
☆362Updated last year
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆246Updated 6 months ago
penghao-wu / vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆653Updated last year
UCSC-VLAA / CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆316Updated last year
Vision-CAIR / ChatCaptioner
Official Repository of ChatCaptioner
☆463Updated 2 years ago
SALT-NLP / LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆269Updated last year
OpenGVLab / all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆489Updated 11 months ago
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
☆375Updated last year
yuweihao / MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆306Updated 6 months ago
luogen1996 / LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
☆522Updated last year
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 2 weeks ago
LLaVA-VL / LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆752Updated last year
jshilong / GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆539Updated 2 months ago
fabawi / ImageBind-LoRA
Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA
☆186Updated last year
HenryHZY / Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
☆357Updated last year
allenai / unified-io-2
☆621Updated last year
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆186Updated 11 months ago
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆327Updated last year
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆278Updated last year
allenai / unified-io-inference
☆227Updated last year
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆262Updated 7 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆404Updated 5 months ago
HyperGAI / HPT
HPT - Open Multimodal LLMs from HyperGAI
☆315Updated last year
AILab-CVC / SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆345Updated 6 months ago
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆257Updated last year
mshukor / UnIVAL
[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
☆228Updated last year