apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,607Updated last month
Related projects ⓘ
Alternatives and complementary repositories for ml-4m
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,673Updated 3 months ago
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆621Updated last month
- This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Model…☆697Updated 6 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,840Updated 3 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL☆1,390Updated this week
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆890Updated 2 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆803Updated 3 months ago
- Schedule-Free Optimization in PyTorch☆1,898Updated 2 weeks ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,999Updated 2 weeks ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,084Updated last week
- nanoGPT style version of Llama 3.1☆1,246Updated 3 months ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,255Updated this week
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆2,339Updated 2 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,763Updated 3 weeks ago
- ☆2,898Updated last month
- Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024☆1,378Updated 4 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆808Updated 2 weeks ago
- ☆763Updated this week
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,324Updated 3 months ago
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,199Updated this week
- DataComp for Language Models☆1,157Updated this week
- A family of lightweight multimodal models.☆933Updated this week
- 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]☆577Updated 8 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]☆664Updated 4 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆705Updated 9 months ago
- Next-Token Prediction is All You Need☆1,824Updated 3 weeks ago
- A suite of image and video neural tokenizers☆796Updated last week
- PyTorch native quantization and sparsity for training and inference☆1,585Updated this week