kyegomez/PALI3

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kyegomez/PALI3)

kyegomez / PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

☆147

Alternatives and similar repositories for PALI3

Users that are interested in PALI3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyegomez / HRTX
View on GitHub
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
☆15Jun 27, 2025Updated last year
kyegomez / PALI
View on GitHub
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆95Mar 20, 2024Updated 2 years ago
kyegomez / MegaVIT
View on GitHub
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
☆32Jun 22, 2026Updated last month
scenarios / WeMM
View on GitHub
☆90Jul 4, 2024Updated 2 years ago
kyegomez / KosmosG
View on GitHub
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
☆13Nov 11, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
JiuTian-VL / JiuTian-LION
View on GitHub
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆154Sep 3, 2025Updated 10 months ago
kyegomez / AlphaDev
View on GitHub
Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…
☆11Aug 29, 2023Updated 2 years ago
kyegomez / PALM-E
View on GitHub
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
☆339Jan 29, 2024Updated 2 years ago
mynameischaos / Lion
View on GitHub
Lion: Kindling Vision Intelligence within Large Language Models
☆51Jan 25, 2024Updated 2 years ago
Agora-Lab-AI / Atom
View on GitHub
a suite of finetuned LLMs for atomically precise function calling 🧪
☆16Updated this week
kyegomez / CM3Leon
View on GitHub
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …
☆365Dec 15, 2023Updated 2 years ago
kyegomez / GPT3
View on GitHub
An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"
☆22Jun 29, 2024Updated 2 years ago
autodistill / autodistill-sam-clip
View on GitHub
SAM-CLIP module for use with Autodistill.
☆18Nov 21, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago
DirtyHarryLYL / LLM-in-Vision
View on GitHub
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆871Mar 8, 2025Updated last year
catfish132 / DiffusionRRG
View on GitHub
☆10Aug 24, 2023Updated 2 years ago
Pillars-Creation / Visualglm-image-to-text
View on GitHub
补充了一些Visualglm缺少的文件，可以对Visualglm进行训练，实例中是对人脸做了面相的识别
☆13Jun 7, 2023Updated 3 years ago
kyegomez / MM1
View on GitHub
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
☆26Updated this week
kyegomez / Pegasus
View on GitHub
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
☆14Oct 16, 2024Updated last year
joeyz0z / ConZIC
View on GitHub
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆76Sep 20, 2023Updated 2 years ago
kyegomez / Fuyu
View on GitHub
Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch
☆24Nov 11, 2024Updated last year
kyegomez / Falcon
View on GitHub
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…
☆12Mar 11, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
kyegomez / Paper-Implementation-Template
View on GitHub
A simple reproducible template to implement AI research papers
☆24Sep 9, 2024Updated last year
kyegomez / Qwen-VL
View on GitHub
My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…
☆13Jan 29, 2024Updated 2 years ago
LLaVA-VL / LLaVA-Plus-Codebase
View on GitHub
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆770Feb 1, 2024Updated 2 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
kyegomez / COT-SC
View on GitHub
Plug in and Play Prompt Technique to Boost Model reasoning by 40%
☆12May 30, 2023Updated 3 years ago
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
kyegomez / NaViT
View on GitHub
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆274Updated this week
kyegomez / SayCan
View on GitHub
Implementation of "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances" by Google
☆26Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
kyegomez / MGQA
View on GitHub
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…
☆17Dec 11, 2023Updated 2 years ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
mlfoundations / VisIT-Bench
View on GitHub
☆51Oct 29, 2023Updated 2 years ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
mlpc-ucsd / BLIVA
View on GitHub
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
☆261Apr 14, 2024Updated 2 years ago
kyegomez / MAGVIT2
View on GitHub
Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"
☆15Nov 11, 2024Updated last year