hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆307Updated last year
Alternatives and similar repositories for pytorch-llama:
Users that are interested in pytorch-llama are comparing it to the libraries listed below
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆99Updated last year
- ☆136Updated 2 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆63Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆597Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆91Updated last year
- Official PyTorch implementation of QA-LoRA☆129Updated last year
- Notes about LLaMA 2 model☆54Updated last year
- Awesome list for LLM quantization☆186Updated 3 months ago
- Explorations into some recent techniques surrounding speculative decoding☆248Updated 3 months ago
- Ring attention implementation with flash attention☆714Updated last month
- Notes and commented code for RLHF (PPO)☆77Updated last year
- Implementation of FlashAttention in PyTorch☆138Updated 2 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,216Updated 2 weeks ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆654Updated this week
- Large Context Attention☆693Updated 2 months ago
- LoRA and DoRA from Scratch Implementations☆198Updated last year
- ☆220Updated 9 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆329Updated 4 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆201Updated last week
- Scalable toolkit for efficient model alignment☆750Updated this week
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆198Updated 3 months ago
- ☆158Updated last month
- Official repository for ORPO☆445Updated 9 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆770Updated this week
- Reference implementation of Mistral AI 7B v0.1 model.☆28Updated last year
- Efficient LLM Inference over Long Sequences☆365Updated last month
- ☆393Updated 2 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆524Updated last month
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 4 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆393Updated 5 months ago