shreydan / VisionGPT2Links

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

☆43

Alternatives and similar repositories for VisionGPT2

Users that are interested in VisionGPT2 are comparing it to the libraries listed below

Sorting:

nivibilla / build-nanogpt
Video+code lecture on building nanoGPT from scratch
☆69Updated last year
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆231Updated last year
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 5 months ago
Locutusque / TinyMistral-train-eval
The training notebooks that were similar to the original script used to train TinyMistral.
☆22Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 9 months ago
QuixiAI / grokadamw
☆134Updated 11 months ago
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆66Updated 11 months ago
NousResearch / Obsidian
Maybe the new state of the art vision model? we'll see 🤷‍♂️
☆167Updated last year
sshh12 / multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
☆185Updated last year
wjbmattingly / qwen2-vl-finetune-huggingface
This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
☆73Updated 3 weeks ago
catid / dora
Implementation of DoRA
☆301Updated last year
VatsaDev / nanoChatGPT
nanogpt turned into a chat model
☆70Updated last year
neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆110Updated last month
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆56Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆156Updated last year
Delve-ERAV1 / Phi-2-Vision-Language
Pretraining and finetuning for visual instruction following with Mixture of Experts
☆16Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆207Updated last year
ritabratamaiti / AnyModal
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
☆101Updated 7 months ago
adithya-s-k / YoloGemma
Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…
☆82Updated last year
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆103Updated 2 years ago
melisa-writer / short-transformers
Prune transformer layers
☆69Updated last year
jadechip / nanoXLSTM
The simplest, fastest repository for training/finetuning medium-sized xLSTMs.
☆41Updated last year
CERC-AAI / Robin
☆63Updated 10 months ago
hkproj / multi-latent-attention
☆43Updated 2 months ago
Jaykef / ai-algorithms
First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…
☆177Updated 2 weeks ago
ariG23498 / quantized-diffusion-inference
Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs
☆38Updated 9 months ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆117Updated 6 months ago
ariG23498 / fine-tune-paligemma
Notebooks for fine tuning pali gemma
☆112Updated 3 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆117Updated 2 weeks ago