rbalestr-lab / llm-jepaLinks

☆141

Alternatives and similar repositories for llm-jepa

Users that are interested in llm-jepa are comparing it to the libraries listed below

Sorting:

facebookresearch / Mixture-of-Transformers
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
☆129Updated 3 months ago
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆212Updated last month
complex-reasoning / RPG
Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆58Updated 2 months ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆115Updated last month
alexiglad / EBT
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
☆565Updated last month
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆57Updated 6 months ago
NVlabs / RLP
RLP: Reinforcement as a Pretraining Objective
☆213Updated 2 months ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 7 months ago
ChenWu98 / algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
☆80Updated 6 months ago
s-sahoo / Eso-LMs
Esoteric Language Models
☆109Updated 3 weeks ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆68Updated 2 months ago
keshik6 / grafting
[NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting
☆67Updated 6 months ago
RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆58Updated 9 months ago
bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆141Updated last month
callsys / GMPO
Geometric-Mean Policy Optimization
☆95Updated last month
convergence-ai / lm2
Official repo of paper LM2
☆46Updated 10 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆134Updated last month
NVlabs / hymba
☆205Updated last year
ShadeAlsha / ICon
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
☆118Updated 5 months ago
Weixin-Liang / Mixture-of-Mamba
☆50Updated 10 months ago
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆116Updated last year
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆42Updated 8 months ago
apple / ml-l3m
Large multi-modal models (L3M) pre-training.
☆223Updated 2 months ago
sail-sg / Precision-RL
Defeating the Training-Inference Mismatch via FP16
☆163Updated last month
lucidrains / h-net-dynamic-chunking
Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon
☆65Updated 4 months ago
EPFL-VILAB / fm-vision-evals
☆72Updated 5 months ago
RobertCsordas / moeut
☆89Updated last year
amorehead / jvp_flash_attention
Flash Attention Triton kernel with support for second-order derivatives
☆121Updated this week
OpenMOSS / Lorsa
☆29Updated last month
VsonicV / es-fine-tuning-paper
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆277Updated 3 weeks ago