Entropy-xcy / bitnet158
☆68Updated 11 months ago
Alternatives and similar repositories for bitnet158:
Users that are interested in bitnet158 are comparing it to the libraries listed below
- PB-LLM: Partially Binarized Large Language Models☆151Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 4 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆191Updated 7 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated 2 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- ☆192Updated 2 months ago
- QuIP quantization☆50Updated 11 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆106Updated this week
- Prune transformer layers☆67Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆39Updated 8 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆116Updated 2 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆246Updated 4 months ago
- Code for studying the super weight in LLM☆80Updated 2 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆362Updated 11 months ago
- ☆125Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated 11 months ago
- RWKV, in easy to read code☆67Updated 2 months ago
- ☆84Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆229Updated this week
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆102Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆266Updated last year
- ☆112Updated last week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆80Updated 2 weeks ago
- ☆66Updated 7 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆215Updated 3 weeks ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆91Updated last year
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆33Updated 2 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆221Updated 6 months ago