aikitoria / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆25Updated 2 weeks ago
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 4 months ago
- ☆130Updated 2 months ago
- LLM Inference on consumer devices☆115Updated 2 months ago
- LLM inference in C/C++☆77Updated 3 weeks ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆416Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆519Updated this week
- A pipeline parallel training script for LLMs.☆147Updated last month
- ☆27Updated 2 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆272Updated 2 weeks ago
- Fast and memory-efficient exact attention☆173Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆117Updated this week
- KV cache compression for high-throughput LLM inference☆129Updated 4 months ago
- High-speed and easy-use LLM serving framework for local deployment☆109Updated 2 months ago
- GPU Power and Performance Manager☆59Updated 7 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆211Updated 6 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 9 months ago
- Development repository for the Triton language and compiler☆122Updated this week
- QuIP quantization☆52Updated last year
- LLM inference in C/C++☆21Updated 2 months ago
- LM inference server implementation based on *.cpp.☆203Updated last week
- ☆194Updated last month
- Samples of good AI generated CUDA kernels☆65Updated last week
- scalable and robust tree-based speculative decoding algorithm☆345Updated 4 months ago
- InferX is a Inference Function as a Service Platform☆106Updated this week
- RWKV-7: Surpassing GPT☆88Updated 6 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆153Updated last year
- ☆75Updated this week
- AI Tensor Engine for ROCm☆201Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆251Updated 7 months ago
- ☆48Updated 3 weeks ago