olealgoritme / gddr6
Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.
☆69Updated 3 weeks ago
Related projects: ⓘ
- ☆41Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆66Updated 11 months ago
- 8-bit CUDA functions for PyTorch Rocm compatible☆37Updated 5 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆11Updated 2 months ago
- Fast and memory-efficient exact attention☆126Updated this week
- ☆37Updated last year
- ☆48Updated 6 months ago
- 8-bit CUDA functions for PyTorch☆34Updated this week
- llama.cpp to PyTorch Converter☆21Updated 5 months ago
- build scripts for ROCm☆181Updated 8 months ago
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 2 months ago
- ☆51Updated 3 months ago
- Python bindings for ggml☆125Updated 2 weeks ago
- QuIP quantization☆41Updated 6 months ago
- Make PyTorch models at least run on APUs.☆41Updated 9 months ago
- Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.☆68Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆82Updated 3 weeks ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- My personal fork of koboldcpp where I hack in experimental samplers.☆42Updated 4 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆96Updated 4 months ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆43Updated last year
- A community list of common phrases generated by GPT and Claude models☆68Updated 10 months ago
- My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend☆62Updated last week
- Controlling fans on my NVIDIA graphics card☆19Updated 2 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆25Updated 3 weeks ago
- ☆26Updated last year
- A JAX implementation of the continuous time formulation of Consistency Models☆83Updated last year
- Experiment of using Tangent to autodiff triton☆66Updated 7 months ago