zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
β32Updated 4 months ago
Related projects β
Alternatives and complementary repositories for pythia-mlkv
- DPO, but faster πβ21Updated 2 weeks ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMβ49Updated 7 months ago
- A repository for research on medium sized language models.β74Updated 5 months ago
- Data preparation code for CrystalCoder 7B LLMβ42Updated 6 months ago
- Official implementation of ECCV24 paper: POAβ24Updated 3 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β51Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β91Updated last month
- β44Updated last month
- My fork os allen AI's OLMo for educational purposes.β28Updated 7 months ago
- Lottery Ticket Adaptationβ36Updated last month
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ70Updated last week
- From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debuggingβ52Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingβ37Updated 3 weeks ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β43Updated 3 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and videoβ¦β27Updated 4 months ago
- β62Updated last month
- Recaption large (Web)Datasets with vllm and save the artifacts.β30Updated last month
- β57Updated last month
- A list of language models with permissive licenses such as MIT or Apache 2.0β22Updated last week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activatedβ30Updated 2 months ago
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3Dβ¦β31Updated this week
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)β133Updated last month
- This is the official repository for Inheritune.β105Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β37Updated 6 months ago
- GoldFinch and other hybrid transformer componentsβ39Updated 3 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β29Updated 6 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];β34Updated 10 months ago
- β61Updated 2 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMsβ38Updated 4 months ago
- Cerule - A Tiny Mighty Vision Modelβ67Updated 2 months ago