zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
☆31Updated 9 months ago
Alternatives and similar repositories for pythia-mlkv:
Users that are interested in pythia-mlkv are comparing it to the libraries listed below
- Train, tune, and infer Bamba model☆86Updated last month
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆44Updated 7 months ago
- Lottery Ticket Adaptation☆38Updated 3 months ago
- A repository for research on medium sized language models.☆77Updated 9 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- GoldFinch and other hybrid transformer components☆45Updated 7 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- DPO, but faster 🚀☆40Updated 3 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆53Updated 11 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 5 months ago
- Here we will test various linear attention designs.☆59Updated 10 months ago
- ☆20Updated 9 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 6 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆33Updated last year
- RWKV-7: Surpassing GPT☆80Updated 3 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 10 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- Using FlexAttention to compute attention with different masking patterns☆42Updated 5 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated 2 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- ☆34Updated 7 months ago
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆31Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 5 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last month
- ☆18Updated 9 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated last year
- ☆31Updated 8 months ago