zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
β32Updated 5 months ago
Related projects β
Alternatives and complementary repositories for pythia-mlkv
- DPO, but faster πβ23Updated 3 weeks ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β52Updated last week
- A repository for research on medium sized language models.β74Updated 5 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β43Updated 4 months ago
- Using FlexAttention to compute attention with different masking patternsβ40Updated 2 months ago
- β27Updated 5 months ago
- Lottery Ticket Adaptationβ36Updated last month
- β62Updated 3 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"β36Updated last month
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paperβ28Updated 5 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLoreβ19Updated 2 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];β34Updated 10 months ago
- Data preparation code for CrystalCoder 7B LLMβ42Updated 6 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"β37Updated last month
- My fork os allen AI's OLMo for educational purposes.β28Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β92Updated last month
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β36Updated last year
- β21Updated this week
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β44Updated 10 months ago
- Cascade Speculative Draftingβ26Updated 7 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"β86Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMβ50Updated 7 months ago
- β64Updated last month
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Promptsβ34Updated 8 months ago
- β52Updated 2 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ46Updated 2 months ago
- Official implementation of ECCV24 paper: POAβ24Updated 3 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044β30Updated last month
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Modelsβ30Updated this week