apple / ml-aim
This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
☆696Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for ml-aim
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,245Updated 2 weeks ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆703Updated 9 months ago
- DataComp: In search of the next generation of multimodal datasets☆651Updated 10 months ago
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,302Updated 2 months ago
- 4M: Massively Multimodal Masked Modeling☆1,600Updated last month
- When do we not need larger vision models?☆333Updated 2 months ago
- A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance Multi-Modal Model. Powered by Zeta, the simplest…☆435Updated this week
- ☆569Updated 8 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆523Updated 10 months ago
- A suite of image and video neural tokenizers☆379Updated this week
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆228Updated 2 months ago
- VisionLLM Series☆903Updated 3 weeks ago
- LLaVA-Interactive-Demo☆352Updated 3 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,823Updated 3 months ago
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆891Updated 8 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆365Updated 4 months ago
- A family of lightweight multimodal models.☆928Updated 2 weeks ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆801Updated 2 months ago
- Open-MAGVIT2: Democratizing Autoregressive Visual Generation☆686Updated last month
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆676Updated 3 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆785Updated this week
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation☆619Updated last month
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆613Updated 3 weeks ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆777Updated 5 months ago
- This repo contains the code for 1D tokenizer and generator☆527Updated this week
- ☆193Updated last year
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation☆913Updated last week
- Official implementation of SEED-LLaMA (ICLR 2024).☆574Updated last month
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Language☆583Updated 2 weeks ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆429Updated 9 months ago