joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆30Updated last month
Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-:
Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below
- Collection of autoregressive model implementation☆81Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆167Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆51Updated 10 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 4 months ago
- Focused on fast experimentation and simplicity☆65Updated last month
- Cerule - A Tiny Mighty Vision Model☆67Updated 5 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 10 months ago
- look how they massacred my boy☆63Updated 4 months ago
- ☆71Updated 6 months ago
- DeMo: Decoupled Momentum Optimization☆180Updated 2 months ago
- ☆32Updated 2 weeks ago
- ☆49Updated 11 months ago
- Video+code lecture on building nanoGPT from scratch☆65Updated 8 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆44Updated 2 months ago
- ☆47Updated 5 months ago
- An introduction to LLM Sampling☆75Updated 2 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆156Updated last month
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 3 months ago
- NanoGPT (124M) quality in 2.67B tokens☆27Updated this week
- A repository for research on medium sized language models.☆76Updated 8 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- My fork os allen AI's OLMo for educational purposes.☆30Updated 2 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 3 months ago