aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
☆56Updated last year
Alternatives and similar repositories for LLaMA2:
Users that are interested in LLaMA2 are comparing it to the libraries listed below
- ☆212Updated 7 months ago
- Explorations into some recent techniques surrounding speculative decoding☆229Updated 3 weeks ago
- Low-bit optimizers for PyTorch☆125Updated last year
- ☆124Updated 11 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆90Updated last year
- PB-LLM: Partially Binarized Large Language Models☆150Updated last year
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆265Updated 4 months ago
- Official PyTorch implementation of QA-LoRA☆122Updated 10 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆171Updated 3 months ago
- Easy and Efficient Quantization for Transformers☆191Updated last month
- ☆140Updated last year
- ☆107Updated 3 months ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆174Updated last year
- GPTQ inference Triton kernel☆291Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆139Updated 4 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆402Updated this week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 7 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆152Updated 6 months ago
- ☆191Updated last month
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 5 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆325Updated 5 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 7 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆85Updated 10 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆264Updated 3 months ago
- ☆168Updated 3 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆358Updated 10 months ago
- ☆212Updated 8 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆115Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆75Updated last year
- ☆251Updated last year