GreenBitAI / bitorch-engine
A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.
☆28Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for bitorch-engine
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆72Updated 3 weeks ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 9 months ago
- QuIP quantization☆46Updated 7 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago
- ☆34Updated 8 months ago
- Repository for CPU Kernel Generation for LLM Inference☆24Updated last year
- DPO, but faster 🚀☆20Updated last week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- Accelerating your LLM training to full speed☆25Updated this week
- Cascade Speculative Drafting☆26Updated 7 months ago
- ☆41Updated 11 months ago
- ☆33Updated 3 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆51Updated this week
- ☆44Updated 2 months ago
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆45Updated 3 months ago
- A repository for research on medium sized language models.☆74Updated 5 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- ☆26Updated 4 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- This repository contains code for the MicroAdam paper.☆12Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆43Updated 3 months ago
- Here we will test various linear attention designs.☆56Updated 6 months ago
- My fork os allen AI's OLMo for educational purposes.☆28Updated 6 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆19Updated 5 months ago
- ☆61Updated 2 months ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated this week
- This repo is based on https://github.com/jiaweizzhao/GaLore☆18Updated last month