99991 / pyggufLinks

GGUF parser in Python

☆28

Alternatives and similar repositories for pygguf

Users that are interested in pygguf are comparing it to the libraries listed below

Sorting:

catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
IST-DASLab / Quartet
☆68Updated this week
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆33Updated last year
nyunAI / PruneGPT
☆53Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆92Updated 7 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆129Updated this week
AlpinDale / QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
☆37Updated last year
cahya-wirawan / rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
☆46Updated 2 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆80Updated last month
abetlen / ggml-python
Python bindings for ggml
☆141Updated 9 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆119Updated last month
Zyphra / Zyda_processing
☆35Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
IST-DASLab / MoE-Quant
Code for data-aware compression of DeepSeek models
☆35Updated 2 weeks ago
euclaise / supertrainer2000
☆49Updated last year
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 4 months ago
NolanoOrg / sparse_quant_llms
SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia
☆41Updated 2 years ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆79Updated 9 months ago
UmerHA / triton_util
Make triton easier
☆46Updated last year
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆83Updated 3 weeks ago
gau-nernst / quantized-training
Explore training for quantized models
☆18Updated this week
kroggen / mamba.c
Inference of Mamba models in pure C
☆187Updated last year
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆72Updated 8 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated last year
NickL77 / BaldEagle
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
☆66Updated last week
wdlctc / mini-s
☆51Updated 7 months ago
casper-hansen / AutoAWQ_kernels
☆74Updated 7 months ago