joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆23Updated last week
Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-:
Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 9 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 3 months ago
- ☆24Updated 2 weeks ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 4 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Collection of autoregressive model implementation☆76Updated last week
- ☆37Updated 5 months ago
- alternative way to calculating self attention☆18Updated 7 months ago
- Focused on fast experimentation and simplicity☆64Updated 3 weeks ago
- Video+code lecture on building nanoGPT from scratch☆65Updated 7 months ago
- NanoGPT (124M) quality in 2.67B tokens☆24Updated this week
- look how they massacred my boy☆63Updated 3 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆60Updated 2 months ago
- A collection of optimizers for MLX☆25Updated this week
- BH hackathon☆14Updated 9 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 6 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆44Updated last month
- Fast approximate inference on a single GPU with sparsity aware offloading☆38Updated last year
- An introduction to LLM Sampling☆75Updated last month
- ☆46Updated 2 months ago
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…☆17Updated 3 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated last month
- Latent Large Language Models☆17Updated 4 months ago
- ☆65Updated 7 months ago
- ☆30Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆147Updated 2 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆26Updated this week
- Train, tune, and infer Bamba model☆75Updated this week
- ☆48Updated 3 weeks ago