mlfoundations / MINT-1TLinks

MINT-1T: A one trillion token multimodal interleaved dataset.

☆819

Alternatives and similar repositories for MINT-1T

Users that are interested in MINT-1T are comparing it to the libraries listed below

Sorting:

facebookresearch / MILS
Code release for "LLMs can see and hear without any training"
☆447Updated 2 months ago
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆522Updated 2 months ago
DLYuanGod / TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
☆1,294Updated last year
facebookresearch / MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…
☆1,478Updated last week
AviSoori1x / makeMoE
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
☆730Updated 8 months ago
penghao-wu / vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆645Updated last year
facebookresearch / MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,310Updated 3 months ago
rhymes-ai / Aria
Codebase for Aria - an Open Multimodal Native MoE
☆1,059Updated 6 months ago
GAIR-NLP / anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
☆775Updated last month
01-ai / Yi-1.5
Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
☆554Updated 8 months ago
allenai / mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆934Updated 4 months ago
lucidrains / self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,394Updated last year
google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆644Updated last month
allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆814Updated 4 months ago
myshell-ai / JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
☆983Updated last year
labmlai / inspectus
LLM Analytics
☆673Updated 9 months ago
HyperGAI / HPT
HPT - Open Multimodal LLMs from HyperGAI
☆316Updated last year
allenai / unified-io-2
☆616Updated last year
LLaVA-VL / LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆751Updated last year
microsoft / Samba
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
☆897Updated 2 months ago
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,039Updated 3 weeks ago
GAIR-NLP / LIMO
[COLM 2025] LIMO: Less is More for Reasoning
☆983Updated 2 weeks ago
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆397Updated 8 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆382Updated 3 months ago
SkyworkAI / Skywork-OR1
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
☆685Updated last month
XueFuzhao / OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,567Updated last year
facebookresearch / chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,037Updated 11 months ago
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆565Updated 5 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆531Updated 3 weeks ago
apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,331Updated 3 months ago