ContextualAI / lensLinks
This is the official repository for the LENS (Large Language Models Enhanced to See) system.
ā352Updated last month
Alternatives and similar repositories for lens
Users that are interested in lens are comparing it to the libraries listed below
Sorting:
- š Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".ā460Updated last year
- š§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".ā482Updated last year
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"ā246Updated 7 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"ā269Updated last year
- Official Repository of ChatCaptionerā465Updated 2 years ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"ā316Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"ā522Updated last year
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestā540Updated 2 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of ā¦ā494Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"ā667Updated last year
- Code release for "Learning Video Representations from Large Language Models"ā530Updated last year
- DataComp: In search of the next generation of multimodal datasetsā734Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"ā145Updated last week
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)ā307Updated 7 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal ā¦ā363Updated last year
- HPT - Open Multimodal LLMs from HyperGAIā315Updated last year
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRAā190Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsā757Updated last year
- LLaVA-Interactive-Demoā377Updated last year
- Official code for "TOAST: Transfer Learning via Attention Steering"ā188Updated 2 years ago
- E5-V: Universal Embeddings with Multimodal Large Language Modelsā263Updated 8 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.ā346Updated 7 months ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dā¦ā207Updated 11 months ago
- ā622Updated last year
- ā228Updated last year
- Research Trends in LLM-guided Multimodal Learning.ā356Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsā330Updated last year
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerā384Updated 4 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsā278Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.ā229Updated last year