hkproj / multi-latent-attentionLinks

☆45

Alternatives and similar repositories for multi-latent-attention

Users that are interested in multi-latent-attention are comparing it to the libraries listed below

Sorting:

MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
kmohan321 / Research_Papers
☆46Updated 8 months ago
1y33 / 100Days
GPU Kernels
☆209Updated 7 months ago
ThinamXx / Meta-llama
Complete implementation of Llama2 with/without KV cache & inference 🚀
☆48Updated last year
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆89Updated 4 months ago
ThinamXx / build-GPT
Building GPT ...
☆18Updated last year
ariG23498 / fine-tune-paligemma
Notebooks for fine tuning pali gemma
☆117Updated 7 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
cornstarch-org / Cornstarch
☆113Updated 2 months ago
YuvrajSingh-mist / Paper-Replications
A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch
☆392Updated 3 weeks ago
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆45Updated 10 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
ariG23498 / quantized-diffusion-inference
Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs
☆38Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆74Updated last year
huggingface / picotron_tutorial
☆224Updated last week
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆87Updated last year
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆251Updated last year
hkproj / triton-flash-attention
☆222Updated 11 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆100Updated 8 months ago
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆70Updated 6 months ago
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆117Updated 2 years ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆75Updated 3 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
Lossfunk / KernelBench-v2
KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems
☆21Updated 5 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 7 months ago
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
huggingface / ai-deadlines
⏰ AI conference deadline countdowns
☆290Updated last week
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 7 months ago