aju22 / LLaMA2Links
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
☆72Updated 2 years ago
Alternatives and similar repositories for LLaMA2
Users that are interested in LLaMA2 are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of QA-LoRA☆141Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆101Updated last year
- A family of compressed models obtained via pruning and knowledge distillation☆352Updated 10 months ago
- ☆230Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆288Updated 9 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆172Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆83Updated last year
- ☆202Updated 10 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆446Updated 8 months ago
- An extension of the nanoGPT repository for training small MOE models.☆195Updated 6 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆176Updated 6 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆338Updated 5 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆232Updated 6 months ago
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆385Updated last year
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆96Updated 2 years ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆248Updated last year
- Low-bit optimizers for PyTorch☆131Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆161Updated 5 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆268Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆148Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆137Updated 2 years ago
- ☆128Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆388Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆439Updated 11 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆195Updated 8 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆206Updated last year
- ☆222Updated this week
- Code for studying the super weight in LLM☆119Updated 10 months ago