aikitoria / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆31Updated last week
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆38Updated 4 months ago
- LLM Inference on consumer devices☆124Updated 5 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆73Updated last week
- A pipeline parallel training script for LLMs.☆154Updated 4 months ago
- ☆149Updated 2 months ago
- InferX is a Inference Function as a Service Platform☆129Updated last week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆162Updated last year
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆524Updated last week
- ☆82Updated this week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 6 months ago
- GPU Power and Performance Manager☆61Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated last month
- Sparse Inferencing for transformer based LLMs☆197Updated 2 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆484Updated this week
- ☆133Updated 3 months ago
- automatically quant GGUF models☆196Updated last week
- 1.58 Bit LLM on Apple Silicon using MLX☆221Updated last year
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆103Updated 4 months ago
- Gpu benchmark☆67Updated 7 months ago
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆33Updated this week
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆73Updated 9 months ago
- Fast and memory-efficient exact attention☆183Updated 2 weeks ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆127Updated last year
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆47Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 7 months ago
- Samples of good AI generated CUDA kernels☆89Updated 3 months ago
- ☆54Updated 2 months ago
- High-speed and easy-use LLM serving framework for local deployment☆117Updated 3 weeks ago
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆52Updated 3 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆298Updated 3 months ago