RustamyF / clip-multimodal-mlLinks
☆63Updated last year
Alternatives and similar repositories for clip-multimodal-ml
Users that are interested in clip-multimodal-ml are comparing it to the libraries listed below
Sorting:
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆93Updated 6 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆48Updated last month
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆76Updated last week
- InstructionGPT-4☆39Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆160Updated 9 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆78Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆256Updated 6 months ago
- Building a VLM model starts from the basic module.☆16Updated last year
- This is implementation of finetuning BLIP model for Visual Question Answering☆72Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆206Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆153Updated last month
- ☆20Updated last year
- FInetuning CLIP for Few Shot Learning☆42Updated 3 years ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆31Updated 2 months ago
- Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models☆14Updated last year
- ☆26Updated 10 months ago
- ☆142Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆91Updated 3 weeks ago
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆75Updated last month
- Collection of Tools and Papers related to Adapters / Parameter-Efficient Transfer Learning/ Fine-Tuning☆192Updated last year
- ☆46Updated 2 months ago
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆143Updated last year
- Playground for Transformers☆51Updated last year
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆76Updated 3 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆272Updated this week
- New generation of CLIP with fine grained discrimination capability, ICML2025☆203Updated last month
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆73Updated 9 months ago
- ☆59Updated 2 years ago
- a family of highly capabale yet efficient large multimodal models☆185Updated 10 months ago