matt-c1 / llama-3-quant-comparisonLinks
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆153Updated last year
Alternatives and similar repositories for llama-3-quant-comparison
Users that are interested in llama-3-quant-comparison are comparing it to the libraries listed below
Sorting:
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated last year
- ☆90Updated 5 months ago
- A multimodal, function calling powered LLM webui.☆214Updated 8 months ago
- A fast batching API to serve LLM models☆181Updated last year
- Easily view and modify JSON datasets for large language models☆75Updated 2 weeks ago
- ☆71Updated last week
- ☆128Updated last month
- ☆287Updated last month
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆375Updated this week
- 1.58-bit LLaMa model☆81Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆236Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆238Updated last year
- A pipeline parallel training script for LLMs.☆145Updated last month
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆69Updated 7 months ago
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆185Updated 10 months ago
- automatically quant GGUF models☆179Updated this week
- An unsupervised model merging algorithm for Transformers-based language models.☆104Updated last year
- Web UI for ExLlamaV2☆495Updated 3 months ago
- InferX is a Inference Function as a Service Platform☆105Updated this week
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆133Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆199Updated 10 months ago
- Merge Transformers language models by use of gradient parameters.☆206Updated 9 months ago
- LLM Inference on consumer devices☆115Updated 2 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆72Updated 8 months ago
- A frontend for creative writing with LLMs☆121Updated 10 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆254Updated 2 months ago
- ☆157Updated 10 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆153Updated 7 months ago
- Open source LLM UI, compatible with all local LLM providers.☆174Updated 8 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆30Updated 2 months ago