99991 / pygguf
GGUF parser in Python
☆26Updated 8 months ago
Alternatives and similar repositories for pygguf:
Users that are interested in pygguf are comparing it to the libraries listed below
- Python bindings for ggml☆140Updated 7 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- QuIP quantization☆51Updated last year
- Some random tools for working with the GGUF file format☆25Updated last year
- Experiments with BitNet inference on CPU☆53Updated last year
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- Inference of Mamba models in pure C☆187Updated last year
- ☆46Updated 9 months ago
- RWKV in nanoGPT style☆189Updated 10 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Updated 5 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆71Updated 2 months ago
- ☆16Updated last year
- llama.cpp to PyTorch Converter☆33Updated last year
- A fast RWKV Tokenizer written in Rust☆44Updated 3 weeks ago
- ☆71Updated 4 months ago
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated 2 years ago
- Simple Model Similarities Analysis☆21Updated last year
- Code for data-aware compression of DeepSeek models☆20Updated 2 weeks ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆35Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- Inference RWKV v7 in pure C.☆31Updated 3 weeks ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year
- Explore training for quantized models☆17Updated 3 months ago
- ☆53Updated 10 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆274Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆95Updated 8 months ago
- ☆33Updated 10 months ago