Infini-AI-Lab / gsm_infinite
☆36Updated last month
Alternatives and similar repositories for gsm_infinite:
Users that are interested in gsm_infinite are comparing it to the libraries listed below
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆44Updated 8 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆44Updated 5 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 5 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 2 months ago
- ☆87Updated 6 months ago
- Using FlexAttention to compute attention with different masking patterns☆42Updated 6 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆23Updated last week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆111Updated 3 months ago
- ☆125Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated this week
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆58Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.☆116Updated this week
- Cascade Speculative Drafting☆29Updated 11 months ago
- ☆76Updated 2 months ago
- ☆40Updated 3 weeks ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆55Updated 11 months ago
- ☆60Updated 11 months ago
- ☆36Updated 7 months ago
- ☆50Updated 5 months ago
- Transformers components but in Triton☆32Updated last week
- Codebase for Instruction Following without Instruction Tuning☆33Updated 6 months ago
- ☆72Updated this week
- Simple and efficient pytorch-native transformer training and inference (batched)☆71Updated 11 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆40Updated 5 months ago
- DPO, but faster 🚀☆40Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆60Updated last year
- ☆74Updated 7 months ago