aju22 / LLaMA2Links
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
☆70Updated last year
Alternatives and similar repositories for LLaMA2
Users that are interested in LLaMA2 are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of QA-LoRA☆138Updated last year
- A family of compressed models obtained via pruning and knowledge distillation☆348Updated 9 months ago
- An extension of the nanoGPT repository for training small MOE models.☆181Updated 5 months ago
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆82Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆99Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆174Updated 5 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆170Updated last year
- LoRA and DoRA from Scratch Implementations☆210Updated last year
- ☆214Updated 6 months ago
- ☆202Updated 8 months ago
- ☆226Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆161Updated 4 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 10 months ago
- Explorations into some recent techniques surrounding speculative decoding☆283Updated 8 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆245Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆149Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆347Updated last year
- Code for studying the super weight in LLM☆115Updated 8 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated 11 months ago
- Prune transformer layers☆69Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆206Updated last year
- ☆127Updated last year
- Easy and Efficient Quantization for Transformers☆203Updated 2 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆435Updated 10 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆331Updated 3 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆226Updated 5 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆200Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆147Updated 11 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆441Updated 7 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆244Updated last year