AlexBuz / llama-zip
LLM-powered lossless compression tool
☆279Updated 8 months ago
Alternatives and similar repositories for llama-zip:
Users that are interested in llama-zip are comparing it to the libraries listed below
- Experimental adventure game with AI-generated content☆109Updated this week
- A fast batching API to serve LLM models☆182Updated 11 months ago
- LLM-based code completion engine☆181Updated 2 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆742Updated 2 weeks ago
- 1.58-bit LLaMa model☆81Updated last year
- AI management tool☆114Updated 5 months ago
- A multimodal, function calling powered LLM webui.☆214Updated 6 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆71Updated 6 months ago
- ☆284Updated 2 weeks ago
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆124Updated this week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆551Updated 2 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- automatically quant GGUF models☆167Updated last week
- This is our own implementation of 'Layer Selective Rank Reduction'☆235Updated 10 months ago
- Inference of Mamba models in pure C☆187Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 11 months ago
- A ggml (C++) re-implementation of tortoise-tts☆178Updated 8 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆197Updated 9 months ago
- Replace OpenAI with Llama.cpp Automagically.☆314Updated 10 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆148Updated 11 months ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆799Updated 5 months ago
- ☆84Updated 3 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆195Updated 11 months ago
- ☆129Updated 8 months ago
- Fast parallel LLM inference for MLX☆181Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆292Updated this week
- ☆153Updated 9 months ago
- An implementation of bucketMul LLM inference☆216Updated 9 months ago
- Train your own small bitnet model☆65Updated 6 months ago
- Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…☆59Updated 8 months ago