SuriyaaMM / featherLinks
Lower Precision Floating Point Operations
☆65Updated 3 weeks ago
Alternatives and similar repositories for feather
Users that are interested in feather are comparing it to the libraries listed below
Sorting:
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- ☆89Updated last month
- NVIDIA Linux open GPU with P2P support☆119Updated last month
- Sparse Inferencing for transformer based LLMs☆218Updated 5 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆48Updated 3 months ago
- ☆163Updated 7 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆588Updated 2 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆622Updated this week
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆154Updated this week
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆594Updated 2 months ago
- ☆71Updated 7 months ago
- automatically quant GGUF models☆219Updated last month
- ☆51Updated last month
- ☆38Updated 4 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆222Updated last month
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆28Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆49Updated this week
- My personal fork of koboldcpp where I hack in experimental samplers.☆44Updated last year
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆350Updated 9 months ago
- LLM Inference on consumer devices☆128Updated 10 months ago
- ☆140Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆51Updated last year
- RAM is all you need☆259Updated 2 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆283Updated last week
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆80Updated last month
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆69Updated 3 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30Updated 8 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- InferX: Inference as a Service Platform☆151Updated last week
- Get aid from local LLMs right in your PowerShell☆15Updated 8 months ago