RustamyF / clip-multimodal-mlLinks

☆67

Alternatives and similar repositories for clip-multimodal-ml

Users that are interested in clip-multimodal-ml are comparing it to the libraries listed below

Sorting:

2U1 / Llama3.2-Vision-Finetune
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
☆172Updated last month
Montinger / Transformer-Workbench
Playground for Transformers
☆53Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
dino-chiio / blip-vqa-finetune
This is implementation of finetuning BLIP model for Visual Question Answering
☆83Updated last year
GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆58Updated last year
matthewchung74 / qwen_2_5_3B_GRPO_medical_thinking
☆47Updated 7 months ago
riedlerm / multimodal_rag_for_industry
Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications
☆64Updated last year
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆141Updated 9 months ago
mishra-18 / ML-Models
☆46Updated 4 months ago
marslanm / Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…
☆81Updated 5 months ago
wjbmattingly / qwen2-vl-finetune-huggingface
This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
☆77Updated 4 months ago
FreedomIntelligence / Apollo
Multilingual Medicine: Model, Dataset, Benchmark, Code
☆197Updated last year
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆101Updated 5 months ago
zhangfaen / finetune-InternVL2
☆30Updated last year
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆250Updated last year
b-hahn / CLIP
FInetuning CLIP for Few Shot Learning
☆46Updated 3 years ago
stanfordmlgroup / ManyICL
☆145Updated last year
nadsoft-opensource / RAG-with-open-source-multi-modal
☆20Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆214Updated last year
mbzuai-oryx / AIN
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…
☆49Updated 8 months ago
wkcn / TinyCLIP
[ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
☆116Updated last year
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆18Updated last year
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆563Updated 4 months ago
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆169Updated last month
YixiangCh / MRD-RAG
☆27Updated 4 months ago
abachaa / MEDEC
☆37Updated 5 months ago
MonolithFoundation / Bumblebee
A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.
☆38Updated last year
Jaykef / ai-algorithms
First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…
☆179Updated 4 months ago
waltonfuture / InstructionGPT-4
InstructionGPT-4
☆42Updated last year
enrico310786 / image_text_retrieval_BLIP_BLIP2
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
☆15Updated 2 years ago