kyegomez / VisionLLaMALinks

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta

☆16

Alternatives and similar repositories for VisionLLaMA

Users that are interested in VisionLLaMA are comparing it to the libraries listed below

Sorting:

roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆36Updated last year
kyegomez / MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
☆24Updated 2 weeks ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
apple / ml-mofi
☆59Updated last year
autodistill / autodistill-grounded-edgesam
EdgeSAM model for use with Autodistill.
☆27Updated last year
huggingface / pixparse
Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
☆21Updated last year
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year
SkalskiP / SoM
Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️
☆87Updated last year
kyegomez / Kosmos-X
The Next Generation Multi-Modality Superintelligence
☆70Updated 11 months ago
borisdayma / sora-mini
☆17Updated last year
facebookresearch / NeuralMemory
A Data Source for Reasoning Embodied Agents
☆19Updated last year
capjamesg / sam-clip
Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
☆31Updated last year
GenRobo / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆60Updated 8 months ago
gregor-ge / mBLIP
☆86Updated last year
togethercomputer / Dragonfly
☆77Updated 9 months ago
kyegomez / Qwen-VL
My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…
☆12Updated last year
kyegomez / TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆16Updated 8 months ago
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆56Updated 2 months ago
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆94Updated 7 months ago
data2ml / all-clip
Load any clip model with a standardized interface
☆21Updated last year
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆52Updated 7 months ago
qnguyen3 / hermes-llava
☆54Updated last year
docugami / DFM-benchmarks
Benchmarks for Business Document Foundation Models
☆10Updated last year
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
alenic / timm-models-explorer
Timm model explorer
☆41Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
kyegomez / Finetuning-Suite
Finetune any model on HF in less than 30 seconds
☆57Updated 2 weeks ago
EternityYW / Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆36Updated last year
facebookresearch / MultiModalExplorer
Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…
☆27Updated last year
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 5 months ago