xuhaoxh / infini-gram-miniLinks
β40Updated 3 months ago
Alternatives and similar repositories for infini-gram-mini
Users that are interested in infini-gram-mini are comparing it to the libraries listed below
Sorting:
- DPO, but faster πβ46Updated last year
- Linear Attention Sequence Parallelism (LASP)β88Updated last year
- Official Implementation of APB (ACL 2025 main Oral)β32Updated 10 months ago
- GoldFinch and other hybrid transformer componentsβ45Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ41Updated last week
- β61Updated 6 months ago
- FlexAttention w/ FlashAttention3 Supportβ27Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β35Updated 10 months ago
- β54Updated last year
- Kinetics: Rethinking Test-Time Scaling Lawsβ85Updated 5 months ago
- Using FlexAttention to compute attention with different masking patternsβ47Updated last year
- A repository for research on medium sized language models.β77Updated last year
- β19Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ128Updated 6 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β58Updated 2 weeks ago
- Xmixers: A collection of SOTA efficient token/channel mixersβ28Updated 4 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learningβ57Updated 3 weeks ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activatedβ33Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inferenceβ69Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMsβ20Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)β79Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ131Updated last year
- Defeating the Training-Inference Mismatch via FP16β172Updated last month
- Tooling for exact and MinHash deduplication of large-scale text datasetsβ51Updated this week
- β20Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia modelsβ34Updated last year
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the bestβ¦β59Updated 9 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Modelsβ35Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ108Updated 2 months ago