apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,100Updated 3 weeks ago
Alternatives and similar repositories for ml-aim:
Users that are interested in ml-aim are comparing it to the libraries listed below
- A suite of image and video neural tokenizers☆983Updated last month
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,292Updated last week
- 4M: Massively Multimodal Masked Modeling☆1,638Updated 2 months ago
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆750Updated 3 weeks ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆848Updated last week
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,799Updated last month
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆925Updated 9 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆2,444Updated last week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆821Updated last week
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,398Updated 4 months ago
- A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance Multi-Modal Model. Powered by Zeta, the simplest…☆440Updated last month
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,232Updated last month
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI☆814Updated last week
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation☆659Updated 2 months ago
- When do we not need larger vision models?☆342Updated last week
- [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding☆860Updated 5 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆717Updated 10 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆541Updated 11 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆797Updated 3 weeks ago
- VisionLLM Series☆956Updated 2 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆235Updated 3 months ago
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆442Updated last week
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆2,159Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,871Updated 4 months ago
- ☆580Updated 10 months ago
- Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think☆734Updated 2 weeks ago
- Schedule-Free Optimization in PyTorch☆1,990Updated 2 weeks ago
- DataComp: In search of the next generation of multimodal datasets☆667Updated 11 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆390Updated this week
- SEED-Voken: A Series of Powerful Visual Tokenizers☆772Updated last week