LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆55Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for mamba-hf
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- ☆49Updated 7 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- ☆26Updated 4 months ago
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- This is the official repository for Inheritune.☆105Updated last month
- ☆62Updated last month
- ☆61Updated 2 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- ☆33Updated 5 months ago
- ☆44Updated 2 months ago
- ☆62Updated 3 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆58Updated 6 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago
- ☆44Updated last month
- My fork os allen AI's OLMo for educational purposes.☆28Updated 6 months ago
- Collection of autoregressive model implementation☆66Updated this week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- Triton Implementation of HyperAttention Algorithm☆46Updated 10 months ago
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago
- DPO, but faster 🚀☆20Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆22Updated 7 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆73Updated 9 months ago
- QuIP quantization☆46Updated 7 months ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated this week
- ☆38Updated this week
- ☆62Updated last month