apple / ml-4mLinks

4M: Massively Multimodal Masked Modeling

☆1,739

Alternatives and similar repositories for ml-4m

Users that are interested in ml-4m are comparing it to the libraries listed below

Sorting:

apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,317Updated 2 months ago
apple / ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…
☆976Updated 7 months ago
google-research / big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆2,977Updated last month
facebookresearch / perception_models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆1,378Updated last month
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,370Updated 2 weeks ago
facebookresearch / chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,022Updated 11 months ago
NVIDIA / Cosmos-Tokenizer
A suite of image and video neural tokenizers
☆1,637Updated 4 months ago
facebookresearch / MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…
☆1,466Updated 3 months ago
merveenoyan / smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
☆1,504Updated last month
cambrian-mllm / cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆1,919Updated 8 months ago
facebookresearch / schedule_free
Schedule-Free Optimization in PyTorch
☆2,185Updated last month
microsoft / Samba
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
☆883Updated 2 months ago
facebookresearch / MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,305Updated 2 months ago
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,159Updated 3 months ago
facebookresearch / blt
Code for BLT research paper
☆1,720Updated last month
facebookresearch / jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
☆3,110Updated 4 months ago
roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
☆2,580Updated this week
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆999Updated last year
KellerJordan / modded-nanogpt
NanoGPT (124M) in 3 minutes
☆2,751Updated 2 weeks ago
LLaVA-VL / LLaVA-NeXT
☆3,960Updated 3 weeks ago
pytorch / torchtune
PyTorch native post-training library
☆5,296Updated this week
SonyResearch / micro_diffusion
Official repository for our work on micro-budget training of large-scale diffusion models.
☆1,490Updated 5 months ago
PKU-YuanGroup / LLaVA-CoT
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,023Updated last month
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,791Updated 10 months ago
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆5,937Updated 2 weeks ago
SkunkworksAI / BakLLaVA
☆710Updated last year
openai / consistencydecoder
Consistency Distilled Diff VAE
☆2,191Updated last year
NVlabs / RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,228Updated this week
IDEA-Research / Grounding-DINO-1.5-API
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆977Updated 5 months ago
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,573Updated 8 months ago