SuriyaaMM / featherLinks
Lower Precision Floating Point Operations
☆59Updated this week
Alternatives and similar repositories for feather
Users that are interested in feather are comparing it to the libraries listed below
Sorting:
- NVIDIA Linux open GPU with P2P support☆103Updated last month
- Sparse Inferencing for transformer based LLMs☆216Updated 5 months ago
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆348Updated 8 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆589Updated last month
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆50Updated last year
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆616Updated this week
- ☆87Updated last month
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆189Updated last month
- ☆38Updated 3 months ago
- ☆162Updated 6 months ago
- Stable Diffusion and Flux in pure C/C++☆24Updated this week
- ☆48Updated last month
- ☆115Updated this week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated 2 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆270Updated last week
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆29Updated 5 months ago
- ☆69Updated 6 months ago
- 1.58-bit LLaMa model☆83Updated last year
- LLM Inference on consumer devices☆128Updated 9 months ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆156Updated this week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆806Updated this week
- InferX: Inference as a Service Platform☆146Updated last week
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆585Updated 2 weeks ago
- A pipeline parallel training script for LLMs.☆165Updated 8 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆86Updated last week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆150Updated 6 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆228Updated this week
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆29Updated 7 months ago