Delve-ERAV1 / Phi-2-Vision-LanguageLinks

Pretraining and finetuning for visual instruction following with Mixture of Experts

☆16

Alternatives and similar repositories for Phi-2-Vision-Language

Users that are interested in Phi-2-Vision-Language are comparing it to the libraries listed below

Sorting:

wjbmattingly / qwen2-vl-finetune-huggingface
This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
☆73Updated 3 weeks ago
GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆58Updated last year
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
ariG23498 / fine-tune-paligemma
Notebooks for fine tuning pali gemma
☆112Updated 3 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 5 months ago
alexander-moore / vlm
Composition of Multimodal Language Models From Scratch
☆15Updated 11 months ago
geronimi73 / 3090_shorts
minimal scripts for 24GB VRAM GPUs. training, inference, whatever
☆41Updated last month
neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆110Updated last month
wolfecameron / lora_instruction_tune
☆40Updated last year
illuin-tech / vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
☆223Updated this week
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
shreydan / VisionGPT2
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
☆43Updated last year
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆234Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆207Updated last year
KyujinHan / Sakura-SOLAR-DPO
Sakura-SOLAR-DPO: Merge, SFT, and DPO
☆116Updated last year
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 9 months ago
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 3 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆49Updated 5 months ago
swj0419 / detect-pretrain-code-contamination
☆76Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆94Updated 7 months ago
huggingface / gpt-oss-recipes
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
☆222Updated this week
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆74Updated 3 weeks ago
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆230Updated 9 months ago
s-smits / modernbert-finetune
Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training
☆67Updated 6 months ago
AIAnytime / Fine-Tuning-Multimodal-LLM
Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.
☆19Updated last year
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆158Updated last year
jakespringer / echo-embeddings
☆152Updated last year
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆94Updated 4 months ago
adithya-s-k / YoloGemma
Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…
☆82Updated last year