hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆53Updated last year
Alternatives and similar repositories for pytorch-llama-notes:
Users that are interested in pytorch-llama-notes are comparing it to the libraries listed below
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆178Updated last month
- Notes and commented code for RLHF (PPO)☆69Updated 11 months ago
- LLaMA 2 implemented from scratch in PyTorch☆294Updated last year
- Notes on quantization in neural networks☆70Updated last year
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆42Updated 5 months ago
- Reference implementation of Mistral AI 7B v0.1 model.☆28Updated last year
- Distributed training (multi-node) of a Transformer model☆54Updated 10 months ago
- Notes on the Mistral AI model☆18Updated last year
- ☆128Updated last month
- Awesome list for LLM quantization☆170Updated last month
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆61Updated last year
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆188Updated this week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆272Updated last week
- ☆80Updated 4 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 8 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆426Updated last week
- ☆59Updated 2 weeks ago
- ☆111Updated last week
- Advanced Quantization Algorithm for LLMs/VLMs.☆372Updated this week
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆157Updated 7 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆114Updated 2 months ago
- ☆214Updated last month
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆160Updated 2 months ago
- Survey of Small Language Models from Penn State, ...☆156Updated last month
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆186Updated 5 months ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆184Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated 2 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆324Updated 3 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆389Updated 4 months ago
- Explorations into some recent techniques surrounding speculative decoding☆240Updated last month