olealgoritme / gddr6
Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.
☆98Updated this week
Alternatives and similar repositories for gddr6:
Users that are interested in gddr6 are comparing it to the libraries listed below
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆39Updated 4 months ago
- Prometheus exporter for Linux based GDDR6/GDDR6X VRAM and GPU Core Hot spot temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆20Updated 6 months ago
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- Make PyTorch models at least run on APUs.☆52Updated last year
- ☆41Updated last year
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆305Updated this week
- Simple monkeypatch to boost AMD Navi 3 GPUs☆38Updated this week
- ☆296Updated 2 weeks ago
- 8-bit CUDA functions for PyTorch☆48Updated 2 months ago
- Python bindings for ggml☆140Updated 7 months ago
- Fast and memory-efficient exact attention☆171Updated this week
- NVIDIA Linux open GPU with P2P support☆16Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆150Updated 11 months ago
- Gpu benchmark☆59Updated 2 months ago
- My personal fork of koboldcpp where I hack in experimental samplers.☆45Updated 11 months ago
- ☆37Updated last year
- build scripts for ROCm☆189Updated last year
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆38Updated 7 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆292Updated this week
- Deep Learning Primitives and Mini-Framework for OpenCL☆192Updated 7 months ago
- 4 bits quantization of SantaCoder using GPTQ☆51Updated last year
- Fast inference engine for Transformer models☆31Updated 5 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 6 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆24Updated 3 weeks ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Running SXM2/SXM3/SXM4 NVidia data center GPUs in consumer PCs☆102Updated last year
- Train Llama Loras Easily☆31Updated last year
- Framework agnostic python runtime for RWKV models☆146Updated last year
- A pipeline parallel training script for LLMs.☆137Updated 3 weeks ago