aju22 / LLaMA2Links
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
☆74Updated 2 years ago
Alternatives and similar repositories for LLaMA2
Users that are interested in LLaMA2 are comparing it to the libraries listed below
Sorting:
- An extension of the nanoGPT repository for training small MOE models.☆231Updated 10 months ago
- Official PyTorch implementation of QA-LoRA☆145Updated last year
- A family of compressed models obtained via pruning and knowledge distillation☆364Updated 2 months ago
- ☆235Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆85Updated 2 years ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆105Updated last year
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆249Updated 10 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Updated 2 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆150Updated last year
- ☆128Updated 2 years ago
- For releasing code related to compression methods for transformers, accompanying our publications☆454Updated last year
- Code for studying the super weight in LLM☆120Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆365Updated 2 years ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆390Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆355Updated last week
- ☆204Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆176Updated last year
- Prune transformer layers☆74Updated last year
- Easy and Efficient Quantization for Transformers☆202Updated 7 months ago
- Low-bit optimizers for PyTorch☆137Updated 2 years ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆260Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆445Updated last year
- Experiments on speculative sampling with Llama models☆127Updated 2 years ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆143Updated 2 years ago
- Implementation of DoRA☆306Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆169Updated 5 months ago
- Explorations into some recent techniques surrounding speculative decoding☆298Updated last year