apple / ml-4m
4M: Massively Multimodal Masked Modeling
β1,686Updated this week
Alternatives and similar repositories for ml-4m:
Users that are interested in ml-4m are comparing it to the libraries listed below
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β1,930Updated 6 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,182Updated 2 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,197Updated this week
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinfβ¦β828Updated 2 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"β841Updated this week
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,854Updated 3 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.β1,246Updated last month
- Code for BLT research paperβ1,403Updated this week
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β2,606Updated this week
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,352Updated 2 months ago
- Famous Vision Language Models and Their Architecturesβ646Updated last week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β2,916Updated last week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VLβ2,373Updated this week
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β568Updated last year
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Seriesβ897Updated last month
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β697Updated 7 months ago
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ1,832Updated 3 weeks ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"β914Updated 2 weeks ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,576Updated 6 months ago
- β599Updated last year
- Schedule-Free Optimization in PyTorchβ2,098Updated 2 months ago
- Code for the Molmo Vision-Language Modelβ292Updated 2 months ago
- Training Large Language Model to Reason in a Continuous Latent Spaceβ877Updated 3 weeks ago
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β948Updated 3 weeks ago
- nanoGPT style version of Llama 3.1β1,316Updated 6 months ago
- NanoGPT (124M) in 3 minutesβ2,294Updated this week
- PyTorch code and models for V-JEPA self-supervised learning from video.β2,793Updated 6 months ago
- Next-Token Prediction is All You Needβ2,004Updated 3 months ago
- Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024β1,456Updated 7 months ago
- Large Concept Models: Language modeling in a sentence representation spaceβ1,933Updated 3 weeks ago