apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,600Updated last month
Related projects ⓘ
Alternatives and complementary repositories for ml-4m
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,823Updated 3 months ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,968Updated last week
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆613Updated 3 weeks ago
- This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Model…☆696Updated 6 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,749Updated last week
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,245Updated 2 weeks ago
- Schedule-Free Optimization in PyTorch☆1,864Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆801Updated 2 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL☆1,381Updated this week
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,664Updated 3 months ago
- NanoGPT (124M) quality in 8.2 minutes☆911Updated this week
- A native PyTorch Library for large model training☆2,566Updated this week
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆860Updated last month
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation☆913Updated last week
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,117Updated this week
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆2,321Updated 2 months ago
- nanoGPT style version of Llama 3.1☆1,229Updated 3 months ago
- Mixture-of-Experts for Large Vision-Language Models☆1,971Updated 5 months ago
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,187Updated this week
- ☆2,815Updated 3 weeks ago
- 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]☆571Updated 8 months ago
- PyTorch native quantization and sparsity for training and inference☆1,541Updated this week
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆891Updated 8 months ago
- PyTorch native finetuning library☆4,267Updated this week
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,297Updated 2 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆785Updated this week
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆523Updated 10 months ago
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones☆1,246Updated 6 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆703Updated 9 months ago