aikitoria / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆26Updated last month
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆71Updated 4 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆426Updated last month
- ☆28Updated 2 months ago
- LLM inference in C/C++☆77Updated this week
- Fast and memory-efficient exact attention☆174Updated this week
- ☆137Updated this week
- LLM Inference on consumer devices☆119Updated 3 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last week
- ☆78Updated this week
- InferX is a Inference Function as a Service Platform☆111Updated last week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 10 months ago
- Samples of good AI generated CUDA kernels☆83Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆100Updated 2 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆408Updated last week
- A pipeline parallel training script for LLMs.☆149Updated last month
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆31Updated 2 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆154Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆129Updated this week
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆45Updated 4 months ago
- These are performance benchmarks we did to prepare for our own privacy-preserving and NDA-compliant in-house AI coding assistant. If by a…☆25Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated 11 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 4 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 8 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆273Updated last month
- Sparse Inferencing for transformer based LLMs☆183Updated this week
- Lightweight Inference server for OpenVINO☆187Updated last week
- AI Tensor Engine for ROCm☆208Updated this week
- QuIP quantization☆54Updated last year
- ☆95Updated 6 months ago