Lion: Kindling Vision Intelligence within Large Language Models
☆51Jan 25, 2024Updated 2 years ago
Alternatives and similar repositories for Lion
Users that are interested in Lion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation for "Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval"☆25Oct 27, 2025Updated 5 months ago
- ☆90Jul 4, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Large Multimodal Model☆15Apr 8, 2024Updated 2 years ago
- ☆23Jan 8, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆360Dec 18, 2023Updated 2 years ago
- ☆19Dec 6, 2023Updated 2 years ago
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆84Jun 17, 2024Updated last year
- M4 experiment logbook☆58Aug 21, 2023Updated 2 years ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated this week
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Sep 3, 2025Updated 7 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- ☆21Feb 29, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- ☆59Aug 7, 2023Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆176Sep 25, 2024Updated last year
- A collection of visual instruction tuning datasets.☆77Mar 14, 2024Updated 2 years ago
- A PyTorch implementation of ACRNet based on ICME 2023 paper "Weakly-supervised Temporal Action Localization with Adaptive Clustering and …☆15Aug 29, 2023Updated 2 years ago
- [ICME 2023 Oral, Extended to TIP (UR)] The best zero-shot VQA approach that even outperforms several fully-supervised methods.☆41Jul 11, 2023Updated 2 years ago
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆15Jan 25, 2024Updated 2 years ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated last year
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆270Aug 9, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆807Jul 8, 2024Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Apr 14, 2024Updated 2 years ago
- ☆29May 13, 2024Updated last year
- [ECCV 2022] MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes official implementation☆16Feb 2, 2023Updated 3 years ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆248Aug 14, 2024Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆59Mar 24, 2026Updated 3 weeks ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆561Apr 21, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Oct 25, 2023Updated 2 years ago
- The Official PyTorch implementation of "Part Aware Contrastive Learning for Self-Supervised Action Recognition" in IJCAI 2023☆13Nov 9, 2023Updated 2 years ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆506Aug 9, 2024Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Aug 8, 2024Updated last year
- Colorful Prompt Tuning for Pre-trained Vision-Language Models☆49Nov 1, 2022Updated 3 years ago
- 📦 A lightweight machine learning toolkit for researchers, providing common model design & learning functionalities.☆29Jul 2, 2025Updated 9 months ago