cg123 / bitnet
Modeling code for a BitNet b1.58 Llama-style model.
☆23Updated 11 months ago
Alternatives and similar repositories for bitnet:
Users that are interested in bitnet are comparing it to the libraries listed below
- My fork os allen AI's OLMo for educational purposes.☆30Updated 4 months ago
- ☆49Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆42Updated 10 months ago
- ☆33Updated 10 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated last month
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- GoldFinch and other hybrid transformer components☆45Updated 8 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 6 months ago
- ☆48Updated 5 months ago
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- entropix style sampling + GUI☆25Updated 5 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆120Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 10 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- ☆129Updated 8 months ago
- QuIP quantization☆51Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- This is the official repository for Inheritune.☆111Updated 2 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆102Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated 2 months ago
- ☆27Updated last year
- ☆77Updated 8 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆86Updated 3 weeks ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆196Updated 9 months ago
- ☆53Updated 10 months ago