RustamyF / clip-multimodal-ml
☆58Updated last year
Alternatives and similar repositories for clip-multimodal-ml:
Users that are interested in clip-multimodal-ml are comparing it to the libraries listed below
- ☆20Updated last year
- Parameter-Efficient Fine-Tuning for Foundation Models☆57Updated 3 weeks ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆72Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆91Updated 4 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆97Updated 3 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆29Updated 2 weeks ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 10 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆89Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆203Updated 10 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆154Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆243Updated 4 months ago
- InstructionGPT-4☆39Updated last year
- ☆140Updated 11 months ago
- Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications☆45Updated 5 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆86Updated last year
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆194Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- ☆23Updated 8 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆142Updated 9 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆82Updated last year
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆161Updated 8 months ago
- FInetuning CLIP for Few Shot Learning☆41Updated 3 years ago
- An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.☆153Updated this week
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆69Updated 5 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆201Updated 3 months ago
- Code to train CLIP model☆111Updated 3 years ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆85Updated 3 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆65Updated 7 months ago
- ☆173Updated last year
- This is implementation of finetuning BLIP model for Visual Question Answering☆65Updated last year