2U1 / Gemma3-FinetuneLinks

An open-source implementaion for Gemma3 series by Google.

☆58

Alternatives and similar repositories for Gemma3-Finetune

Users that are interested in Gemma3-Finetune are comparing it to the libraries listed below

Sorting:

2U1 / SmolVLM-Finetune
An open-source implementaion for fine-tuning SmolVLM.
☆57Updated 2 months ago
2U1 / Llama3.2-Vision-Finetune
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
☆172Updated last month
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
si0wang / ThinkLite-VL
☆105Updated 5 months ago
anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…
☆116Updated last year
ByungKwanLee / Meteor
[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…
☆116Updated last year
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆58Updated 6 months ago
UCSC-VLAA / OpenVision
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆404Updated 2 months ago
ByungKwanLee / DeepSick-R1
Reproduction of DeepSeek-R1
☆243Updated 7 months ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆158Updated last year
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307Updated 6 months ago
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆96Updated 5 months ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆162Updated 10 months ago
Hon-Wong / VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
☆346Updated 5 months ago
deepglint / RWKV-CLIP
[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner
☆143Updated 6 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago
2U1 / Phi3-Vision-Finetune
An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
☆99Updated last month
lucasjinreal / Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
☆239Updated 7 months ago
deepglint / MVT
Margin-based Vision Transformer
☆55Updated last month
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆274Updated 10 months ago
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆141Updated 9 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
juzhengz / LoRI
[COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
☆161Updated 4 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆126Updated last year
RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆274Updated 9 months ago
wkcn / TinyCLIP
[ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
☆115Updated last year
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 5 months ago
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆75Updated 2 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
Letian2003 / MM_INF
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…
☆32Updated 5 months ago