zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
β32Updated 10 months ago
Alternatives and similar repositories for pythia-mlkv:
Users that are interested in pythia-mlkv are comparing it to the libraries listed below
- Lottery Ticket Adaptationβ39Updated 5 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];β37Updated last year
- DPO, but faster πβ41Updated 4 months ago
- Data preparation code for CrystalCoder 7B LLMβ44Updated 11 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmindβ48Updated 2 months ago
- Using FlexAttention to compute attention with different masking patternsβ43Updated 7 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β55Updated this week
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksβ16Updated 5 months ago
- β20Updated 10 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β44Updated last week
- Train, tune, and infer Bamba modelβ88Updated this week
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β30Updated last month
- Tina: Tiny Reasoning Models via LoRAβ55Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β36Updated 11 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paperβ32Updated last month
- List of papers on Self-Correction of LLMs.β72Updated 4 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modelingβ35Updated 2 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLoreβ26Updated 7 months ago
- A repository for research on medium sized language models.β76Updated 11 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMβ54Updated last year
- β46Updated 9 months ago
- β43Updated 2 months ago
- β62Updated 3 weeks ago
- β16Updated last month
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paperβ32Updated 10 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β37Updated last year
- β53Updated last month
- Official implementation of ECCV24 paper: POAβ24Updated 8 months ago
- β33Updated 10 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.β82Updated last month