aju22 / LLaMA2Links
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
☆74Updated 2 years ago
Alternatives and similar repositories for LLaMA2
Users that are interested in LLaMA2 are comparing it to the libraries listed below
Sorting:
- An extension of the nanoGPT repository for training small MOE models.☆215Updated 8 months ago
- Official PyTorch implementation of QA-LoRA☆144Updated last year
- A family of compressed models obtained via pruning and knowledge distillation☆356Updated 3 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆347Updated 6 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆181Updated 2 weeks ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆175Updated last year
- ☆235Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆108Updated last year
- ☆128Updated last year
- LoRA and DoRA from Scratch Implementations☆215Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 7 months ago
- Prune transformer layers☆74Updated last year
- ☆203Updated 11 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆238Updated 8 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆257Updated last year
- Code for studying the super weight in LLM☆121Updated 11 months ago
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆84Updated 2 years ago
- ☆225Updated last month
- Low-bit optimizers for PyTorch☆132Updated 2 years ago
- Explorations into some recent techniques surrounding speculative decoding☆293Updated 11 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆278Updated 2 years ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆152Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆451Updated 10 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 5 months ago
- A pipeline for LLM knowledge distillation☆110Updated 7 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆293Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆141Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year