apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,143Updated last month
Alternatives and similar repositories for ml-aim:
Users that are interested in ml-aim are comparing it to the libraries listed below
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,337Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,666Updated 3 months ago
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆792Updated last month
- A suite of image and video neural tokenizers☆1,478Updated this week
- When do we not need larger vision models?☆354Updated last month
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,460Updated 5 months ago
- VisionLLM Series☆977Updated 2 weeks ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆892Updated this week
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆814Updated last month
- ☆588Updated 11 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆442Updated this week
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆940Updated 10 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,327Updated 2 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆720Updated 11 months ago
- DataComp: In search of the next generation of multimodal datasets☆674Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizers☆810Updated 2 weeks ago
- A family of lightweight multimodal models.☆972Updated last month
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆238Updated 4 months ago
- Famous Vision Language Models and Their Architectures☆565Updated 4 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆831Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,823Updated 2 months ago
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation☆681Updated 3 months ago
- Next-Token Prediction is All You Need☆1,965Updated 2 months ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,106Updated 9 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆2,516Updated 3 weeks ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,908Updated 5 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆556Updated last year
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer☆348Updated this week
- ☆3,272Updated 3 months ago
- This repo contains the code for 1D tokenizer and generator☆645Updated this week