99991 / pygguf
GGUF parser in Python
☆26Updated 7 months ago
Alternatives and similar repositories for pygguf:
Users that are interested in pygguf are comparing it to the libraries listed below
- QuIP quantization☆52Updated last year
- Experiments with BitNet inference on CPU☆53Updated 11 months ago
- Python bindings for ggml☆140Updated 6 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- Gpu benchmark☆57Updated 2 months ago
- Some random tools for working with the GGUF file format☆25Updated last year
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆79Updated 2 weeks ago
- ☆49Updated last year
- ☆53Updated 9 months ago
- ☆21Updated 3 weeks ago
- A fast RWKV Tokenizer written in Rust☆44Updated last week
- RWKV in nanoGPT style☆187Updated 9 months ago
- ☆69Updated 4 months ago
- Inference of Mamba models in pure C☆186Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆111Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- Make triton easier☆47Updated 9 months ago
- Load compute kernels from the Hub☆99Updated this week
- ☆112Updated this week
- ☆46Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- ☆17Updated 11 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆92Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Explore training for quantized models☆17Updated 2 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 5 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year