aikitoria / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆25Updated last month
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆31Updated 3 months ago
- ☆139Updated 3 weeks ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆446Updated last month
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆101Updated 2 months ago
- Fast and memory-efficient exact attention☆177Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆652Updated this week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 5 months ago
- Samples of good AI generated CUDA kernels☆84Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆156Updated last year
- LLM Inference on consumer devices☆121Updated 4 months ago
- llama.cpp to PyTorch Converter☆33Updated last year
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 10 months ago
- Gpu benchmark☆63Updated 5 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆277Updated last month
- automatically quant GGUF models☆187Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆135Updated this week
- ☆71Updated 2 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆436Updated this week
- A pipeline parallel training script for LLMs.☆153Updated 2 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 5 months ago
- AI Tensor Engine for ROCm☆232Updated this week
- KV cache compression for high-throughput LLM inference☆132Updated 5 months ago
- Code for data-aware compression of DeepSeek models☆36Updated last month
- ☆41Updated 3 weeks ago
- InferX is a Inference Function as a Service Platform☆116Updated 2 weeks ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆528Updated this week
- ☆17Updated 7 months ago
- LLM inference in C/C++☆94Updated this week