hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆319Updated last year
Alternatives and similar repositories for pytorch-llama:
Users that are interested in pytorch-llama are comparing it to the libraries listed below
- ☆153Updated 3 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆63Updated last year
- Notes about LLaMA 2 model☆59Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆101Updated last year
- Implementation of FlashAttention in PyTorch☆141Updated 3 months ago
- Notes and commented code for RLHF (PPO)☆85Updated last year
- Distributed training (multi-node) of a Transformer model☆64Updated last year
- Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)☆266Updated last year
- All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai☆165Updated last year
- LoRA and DoRA from Scratch Implementations☆200Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆779Updated 3 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆93Updated last year
- Fast inference from large lauguage models via speculative decoding☆708Updated 7 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆530Updated this week
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆160Updated 11 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆334Updated 5 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆240Updated last week
- Explorations into some recent techniques surrounding speculative decoding☆254Updated 3 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆510Updated 5 months ago
- Attention is all you need implementation☆890Updated 10 months ago
- ☆238Updated 3 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆231Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)☆786Updated this week
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆440Updated 4 months ago
- ☆166Updated 2 months ago
- A simple and effective LLM pruning approach.☆737Updated 8 months ago
- Large Context Attention☆703Updated 2 months ago
- ring-attention experiments☆129Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆424Updated 3 months ago
- ☆153Updated last year