apple / ml-vfm-ktLinks

☆14

Alternatives and similar repositories for ml-vfm-kt

Users that are interested in ml-vfm-kt are comparing it to the libraries listed below

Sorting:

apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024
☆108Updated last year
apple / ml-mofi
☆59Updated last year
huggingface / pixparse
Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
☆22Updated last year
apple / ml-ogen
☆13Updated last year
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆246Updated 10 months ago
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated last month
ariG23498 / mmdp
☆29Updated 4 months ago
apple / ml-dr
A light-weight implementation of ICCV2023 paper "Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Rei…
☆83Updated 2 years ago
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆37Updated 2 years ago
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆181Updated 6 months ago
huggingface / HuggingSnap
SmolVLM2 Demo
☆178Updated 8 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated last year
alenic / timm-models-explorer
Timm model explorer
☆42Updated last year
apple / ml-aura
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024
☆23Updated last year
apple / ml-vision-transformers-ane
☆88Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆83Updated 3 months ago
autodistill / autodistill-grounded-edgesam
EdgeSAM model for use with Autodistill.
☆29Updated last year
elsevierlabs-os / clip-image-search
Fine-tuning OpenAI CLIP Model for Image Search on medical images
☆77Updated 3 years ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆163Updated 6 months ago
samar-khanna / ExPLoRA
Official code repository for ICML 2025 paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Doma…
☆47Updated 2 months ago
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆159Updated last year
LAION-AI / General-GPT
☆65Updated 2 years ago
gregor-ge / mBLIP
☆87Updated last year
apple / ml-rpm-bench
☆41Updated last year
XiaoduoAILab / XmodelVLM
☆69Updated last year
mfarre / Video-LLaVA-7B-hf-CinePile
Video-LlaVA fine-tune for CinePile evaluation
☆51Updated last year
apple / ml-l3m
Large multi-modal models (L3M) pre-training.
☆222Updated 2 months ago
togethercomputer / Dragonfly
☆80Updated last year
facebookresearch / MultiModalExplorer
Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…
☆27Updated last year