hkproj / quantization-notesLinks
Notes on quantization in neural networks
☆86Updated last year
Alternatives and similar repositories for quantization-notes
Users that are interested in quantization-notes are comparing it to the libraries listed below
Sorting:
- ☆174Updated 5 months ago
- making the official triton tutorials actually comprehensible☆41Updated 3 months ago
- GPU Kernels☆182Updated last month
- ☆159Updated last year
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆189Updated 5 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆109Updated last year
- Distributed training (multi-node) of a Transformer model☆71Updated last year
- Reference implementation of Mistral AI 7B v0.1 model.☆29Updated last year
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆80Updated 3 weeks ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆152Updated 3 months ago
- LoRA and DoRA from Scratch Implementations☆204Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆68Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆335Updated last year
- 100 days of building GPU kernels!☆445Updated last month
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆46Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆24Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆109Updated 8 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆364Updated 3 months ago
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Prune transformer layers☆69Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interface☆201Updated last year
- ☆39Updated last month
- ☆204Updated 3 years ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆304Updated 3 weeks ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆108Updated 2 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆67Updated 3 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆343Updated 7 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆91Updated last year