Oxen-AI / BitNet-1.58-Instruct
Implementation of BitNet-1.58 instruct tuning
☆18Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for BitNet-1.58-Instruct
- Collection of autoregressive model implementation☆66Updated last week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- ☆44Updated 2 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆59Updated 6 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- DPO, but faster 🚀☆21Updated 2 weeks ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- ☆61Updated 2 months ago
- ☆76Updated 6 months ago
- ☆26Updated 4 months ago
- ☆61Updated 3 months ago
- ☆49Updated 7 months ago
- ☆43Updated 2 months ago
- A list of language models with permissive licenses such as MIT or Apache 2.0☆22Updated last week
- Using multiple LLMs for ensemble Forecasting☆16Updated 9 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆33Updated 2 weeks ago
- QLoRA with Enhanced Multi GPU Support☆36Updated last year
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆28Updated 4 months ago
- QuIP quantization☆46Updated 7 months ago
- ☆19Updated this week
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆49Updated last week
- entropix style sampling + GUI☆25Updated 2 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago