joey00072 / Multi-Head-Latent-Attention-MLA-Links

working implimention of deepseek MLA

☆42

Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-

Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below

Sorting:

JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆103Updated 4 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆56Updated last year
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆68Updated 3 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆190Updated 8 months ago
QuixiAI / grokadamw
☆134Updated 11 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
kyleliang919 / Super_Muon
☆60Updated 4 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 3 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 6 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated 9 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated 2 months ago
kmohan321 / Research_Papers
☆46Updated 4 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? we'll see.
☆66Updated 2 weeks ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 9 months ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆245Updated 3 weeks ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated 2 months ago
Think-a-Tron / evolve
open source alpha evolve
☆66Updated 2 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆66Updated 10 months ago
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆64Updated 2 months ago
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 8 months ago
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated last month
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 7 months ago
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆150Updated 9 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 6 months ago
okarthikb / state-space-models
☆27Updated last year
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆76Updated 7 months ago