ContextualAI / lensLinks
This is the official repository for the LENS (Large Language Models Enhanced to See) system.
β351Updated last year
Alternatives and similar repositories for lens
Users that are interested in lens are comparing it to the libraries listed below
Sorting:
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β456Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β487Updated 10 months ago
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ533Updated 3 weeks ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β889Updated 2 weeks ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"β267Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β520Updated last year
- DataComp: In search of the next generation of multimodal datasetsβ719Updated last month
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"β314Updated last year
- Official Repository of ChatCaptionerβ464Updated 2 years ago
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β482Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Modelsβ199Updated 5 months ago
- LLaVA-Interactive-Demoβ374Updated 10 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal β¦β361Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"β145Updated 2 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β629Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β343Updated 5 months ago
- β782Updated 11 months ago
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)β327Updated last year
- Code release for "Learning Video Representations from Large Language Models"β524Updated last year
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β528Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ744Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ324Updated 11 months ago
- [Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.β811Updated 2 years ago
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRAβ184Updated last year
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"β244Updated 5 months ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistantβ238Updated 10 months ago
- a family of highly capabale yet efficient large multimodal modelsβ185Updated 10 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Modelsβ256Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ314Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ630Updated 4 months ago