2U1 / Gemma3-FinetuneLinks
An open-source implementaion for Gemma3 series by Google.
☆58Updated 4 months ago
Alternatives and similar repositories for Gemma3-Finetune
Users that are interested in Gemma3-Finetune are comparing it to the libraries listed below
Sorting:
- An open-source implementaion for fine-tuning SmolVLM.☆57Updated 2 months ago
- An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.☆172Updated last month
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 11 months ago
- ☆105Updated 5 months ago
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆116Updated last year
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆116Updated last year
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆58Updated 6 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆404Updated 2 months ago
- Reproduction of DeepSeek-R1☆243Updated 7 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆158Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆307Updated 6 months ago
- A Simple Framework of Small-scale LMMs for Video Understanding☆96Updated 5 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆162Updated 10 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆346Updated 5 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆143Updated 6 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆68Updated 7 months ago
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆99Updated last month
- A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.☆239Updated 7 months ago
- Margin-based Vision Transformer☆55Updated last month
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Updated 10 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆141Updated 9 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆168Updated last year
- [COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation☆161Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated last year
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆274Updated 9 months ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆115Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109Updated 5 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆75Updated 2 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆180Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆32Updated 5 months ago