rejunity / tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆110Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for tiny-asic-1_58bit-matrix-mul
- 1.58-bit LLaMa model☆79Updated 7 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 weeks ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆222Updated last month
- Train your own small bitnet model☆55Updated 3 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLX☆134Updated 6 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆85Updated 3 weeks ago
- PB-LLM: Partially Binarized Large Language Models☆146Updated 11 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- ☆60Updated last week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆171Updated 3 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆261Updated last year
- prime (previously called ZeroBand) is a framework for efficient, globally distributed training of AI models over the internet.☆203Updated this week
- Fast parallel LLM inference for MLX☆146Updated 4 months ago
- Experimental BitNet Implementation☆60Updated 7 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- QuIP quantization☆46Updated 7 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆99Updated last week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆105Updated last week
- (ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆190Updated 5 months ago
- GPT-2 small trained on phi-like data☆65Updated 8 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆171Updated 3 weeks ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆72Updated 3 weeks ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- ☆43Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated 2 months ago
- Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.☆70Updated last year
- Video+code lecture on building nanoGPT from scratch☆64Updated 4 months ago
- ☆84Updated last month
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆346Updated 8 months ago