olealgoritme / gddr6
Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.
☆96Updated 7 months ago
Alternatives and similar repositories for gddr6:
Users that are interested in gddr6 are comparing it to the libraries listed below
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- Simple monkeypatch to boost AMD Navi 3 GPUs☆35Updated 10 months ago
- ☆40Updated last year
- ☆274Updated this week
- 8-bit CUDA functions for PyTorch☆45Updated last month
- ☆37Updated last year
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆33Updated 3 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆37Updated 7 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆148Updated 10 months ago
- Prometheus exporter for Linux based GDDR6/GDDR6X VRAM and GPU Core Hot spot temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆18Updated 5 months ago
- build scripts for ROCm☆188Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 5 months ago
- Running SXM2/SXM3/SXM4 NVidia data center GPUs in consumer PCs☆98Updated last year
- ☆227Updated 2 years ago
- Fast and memory-efficient exact attention☆162Updated this week
- Deep Learning Primitives and Mini-Framework for OpenCL☆190Updated 6 months ago
- ☆54Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆222Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆11Updated 9 months ago
- Gpu benchmark☆57Updated 2 months ago
- Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.☆71Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆65Updated last year
- Make abliterated models with transformers, easy and fast☆64Updated last week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆196Updated 8 months ago
- ☆112Updated this week
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated last year
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated last year
- ☆49Updated last year
- Run stable-diffusion-webui with Radeon RX 580 8GB on Ubuntu 22.04.2 LTS☆60Updated last year
- Python bindings for ggml☆140Updated 6 months ago