hkproj / pytorch-llamaLinks
LLaMA 2 implemented from scratch in PyTorch
☆337Updated last year
Alternatives and similar repositories for pytorch-llama
Users that are interested in pytorch-llama are comparing it to the libraries listed below
Sorting:
- ☆179Updated 6 months ago
- Notes about LLaMA 2 model☆63Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆110Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆68Updated last year
- Notes and commented code for RLHF (PPO)☆97Updated last year
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆150Updated last year
- Distributed training (multi-node) of a Transformer model☆72Updated last year
- A family of compressed models obtained via pruning and knowledge distillation☆344Updated 8 months ago
- An extension of the nanoGPT repository for training small MOE models.☆160Updated 4 months ago
- Implementation of FlashAttention in PyTorch☆155Updated 6 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆327Updated last week
- Notes on quantization in neural networks☆89Updated last year
- ☆316Updated 6 months ago
- ☆198Updated 5 months ago
- A project to improve skills of large language models☆456Updated this week
- Awesome list for LLM quantization☆251Updated last month
- Explorations into some recent techniques surrounding speculative decoding☆272Updated 6 months ago
- making the official triton tutorials actually comprehensible☆45Updated 3 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆433Updated 5 months ago
- LoRA and DoRA from Scratch Implementations☆206Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆318Updated 2 months ago
- Minimal hackable GRPO implementation☆252Updated 5 months ago
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆502Updated 7 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆98Updated last year
- Tutorial for how to build BERT from scratch☆95Updated last year
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆191Updated 6 months ago
- Scalable toolkit for efficient model alignment☆825Updated last week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆529Updated 2 months ago
- Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)☆293Updated 2 years ago
- ☆88Updated 9 months ago