hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆54Updated last year
Alternatives and similar repositories for pytorch-llama-notes:
Users that are interested in pytorch-llama-notes are comparing it to the libraries listed below
- LLaMA 2 implemented from scratch in PyTorch☆307Updated last year
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆179Updated 2 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆63Updated last year
- Distributed training (multi-node) of a Transformer model☆59Updated 11 months ago
- Awesome list for LLM quantization☆186Updated 3 months ago
- Reference implementation of Mistral AI 7B v0.1 model.☆28Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆91Updated last year
- Notes on the Mistral AI model☆18Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆154Updated 9 months ago
- Notes and commented code for RLHF (PPO)☆77Updated last year
- ☆41Updated 11 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆277Updated 3 weeks ago
- A family of compressed models obtained via pruning and knowledge distillation☆329Updated 4 months ago
- ☆43Updated 4 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆279Updated 2 months ago
- ☆136Updated 2 months ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆191Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆416Updated 2 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆164Updated 3 months ago
- Notes on quantization in neural networks☆77Updated last year
- Official PyTorch implementation of QA-LoRA☆129Updated last year
- ☆145Updated last year
- Awesome list for LLM pruning.☆212Updated 3 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆393Updated 5 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆121Updated 3 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆209Updated 7 months ago
- ☆220Updated 9 months ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆159Updated last year