Picovoice / llm-compression-benchmarkLinks
LLM Compression Benchmark
☆22Updated 3 months ago
Alternatives and similar repositories for llm-compression-benchmark
Users that are interested in llm-compression-benchmark are comparing it to the libraries listed below
Sorting:
- Implementation of mamba with rust☆88Updated last year
- 1.58-bit LLaMa model☆83Updated last year
- Inference of Mamba models in pure C☆193Updated last year
- Python bindings for ggml☆146Updated last year
- ☆136Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- Google TPU optimizations for transformers models☆122Updated 10 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- This is the code that went into our practical dive using mamba as information extraction☆57Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆257Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year
- Fast parallel LLM inference for MLX☆233Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- A compact LLM pretrained in 9 days by using high quality data☆334Updated 7 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- RWKV-7: Surpassing GPT☆101Updated last year
- Token Omission Via Attention☆127Updated last year
- (Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307…☆53Updated 2 years ago
- Various installation guides for Large Language Models☆77Updated 7 months ago
- ☆107Updated 3 months ago
- MLX implementation of xLSTM model by Beck et al. (2024)☆29Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated last year
- ☆67Updated last year
- ReLM is a Regular Expression engine for Language Models☆107Updated 2 years ago
- A fast minimalistic implementation of guided generation on Apple Silicon using Outlines and MLX☆58Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆240Updated last year
- ModernBERT model optimized for Apple Neural Engine.☆28Updated 10 months ago
- ☆198Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆390Updated last year
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆231Updated 5 months ago