hkproj / pytorch-llamaLinks

LLaMA 2 implemented from scratch in PyTorch

☆343

Alternatives and similar repositories for pytorch-llama

Users that are interested in pytorch-llama are comparing it to the libraries listed below

Sorting:

hkproj / triton-flash-attention
☆184Updated 7 months ago
hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆66Updated last year
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆69Updated last year
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆112Updated 2 years ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆164Updated 4 months ago
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆155Updated 6 months ago
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆75Updated last year
hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆101Updated last year
hkproj / transformer-from-scratch-notes
Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)
☆296Updated 2 years ago
hkproj / quantization-notes
Notes on quantization in neural networks
☆95Updated last year
stanford-cs336 / spring2024-lectures
☆334Updated 7 months ago
huggingface / picotron_tutorial
☆206Updated 5 months ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆385Updated last month
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆347Updated 8 months ago
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆151Updated last year
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆323Updated 3 months ago
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆138Updated last year
lucidrains / speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
☆275Updated 7 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆53Updated last week
bkitano / llama-from-scratch
Llama from scratch, or How to implement a paper without crying
☆573Updated last year
yifanlu0227 / MIT-6.5940
All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai
☆177Updated last year
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆396Updated last month
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
pprp / Awesome-LLM-Quantization
Awesome list for LLM quantization
☆260Updated last month
1y33 / 100Days
GPU Kernels
☆191Updated 3 months ago
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
feifeibear / LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
☆791Updated 11 months ago
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need
☆335Updated 3 weeks ago
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆437Updated 6 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆188Updated 2 months ago