apple / ml-l3mLinks

Large multi-modal models (L3M) pre-training.

☆221

Alternatives and similar repositories for ml-l3m

Users that are interested in ml-l3m are comparing it to the libraries listed below

Sorting:

microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
NVlabs / hymba
☆200Updated 11 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 5 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆139Updated last week
rbalestr-lab / llm-jepa
☆135Updated 2 months ago
facebookresearch / Mixture-of-Transformers
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
☆125Updated 2 months ago
ShadeAlsha / ICon
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
☆117Updated 5 months ago
facebookresearch / capi
Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"
☆123Updated 7 months ago
apple / ml-sigmoid-attention
☆303Updated 7 months ago
RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆54Updated 8 months ago
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆205Updated 3 weeks ago
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆57Updated 8 months ago
devvrit / matformer
MatFormer repo
☆66Updated 11 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆123Updated 3 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 10 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated last month
apple / ml-planner
☆56Updated last year
NVlabs / RLP
RLP: Reinforcement as a Pretraining Objective
☆201Updated last month
apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆147Updated last month
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated 2 months ago
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆57Updated 6 months ago
deepreinforce-ai / CUDA-L1
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆243Updated 3 weeks ago
amorehead / jvp_flash_attention
Flash Attention Triton kernel with support for second-order derivatives
☆115Updated last month
ChenWu98 / algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
☆79Updated 6 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
apple / ml-ademamix
☆68Updated last year
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated 11 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆305Updated 3 weeks ago