mynameischaos / LionView external linksLinks
Lion: Kindling Vision Intelligence within Large Language Models
☆51Jan 25, 2024Updated 2 years ago
Alternatives and similar repositories for Lion
Users that are interested in Lion are comparing it to the libraries listed below
Sorting:
- ☆88Jul 4, 2024Updated last year
- Official implementation for "Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval"☆25Oct 27, 2025Updated 3 months ago
- ☆23Jan 8, 2024Updated 2 years ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Large Multimodal Model☆15Apr 8, 2024Updated last year
- ☆21Feb 29, 2024Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆360Dec 18, 2023Updated 2 years ago
- ☆19Dec 6, 2023Updated 2 years ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…☆12Mar 11, 2024Updated last year
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- ☆29May 13, 2024Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Jan 17, 2026Updated last month
- ☆11Jan 8, 2025Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆79Jun 17, 2024Updated last year
- A collection of visual instruction tuning datasets.☆76Mar 14, 2024Updated last year
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Jun 9, 2025Updated 8 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Sep 3, 2025Updated 5 months ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆206Jan 8, 2025Updated last year
- A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long…☆18Sep 12, 2025Updated 5 months ago
- [ECCV 2022] MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes official implementation☆16Feb 2, 2023Updated 3 years ago
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆15Jan 25, 2024Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆173Sep 25, 2024Updated last year
- ☆58Aug 7, 2023Updated 2 years ago
- [ICME 2023 Oral, Extended to TIP (UR)] The best zero-shot VQA approach that even outperforms several fully-supervised methods.☆40Jul 11, 2023Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Aug 8, 2024Updated last year
- A PyTorch implementation of ACRNet based on ICME 2023 paper "Weakly-supervised Temporal Action Localization with Adaptive Clustering and …☆15Aug 29, 2023Updated 2 years ago
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆270Aug 9, 2023Updated 2 years ago
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 2 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Apr 14, 2024Updated last year
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- Code for ICCV 2023 paper ✨ "StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Mo…☆18Jan 25, 2024Updated 2 years ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆557Apr 21, 2024Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- ☆805Jul 8, 2024Updated last year
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆246Aug 14, 2024Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆504Aug 9, 2024Updated last year
- a family of highly capabale yet efficient large multimodal models☆191Aug 23, 2024Updated last year
- Genshin Impact Dataset (GID) for SLAM☆26Mar 13, 2024Updated last year