OswaldHe / HMT-pytorchView external linksLinks
[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
☆80Jan 25, 2026Updated 3 weeks ago
Alternatives and similar repositories for HMT-pytorch
Users that are interested in HMT-pytorch are comparing it to the libraries listed below
Sorting:
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆53Aug 28, 2024Updated last year
- A repository for research on medium sized language models.☆77May 23, 2024Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- Hrrformer: A Neuro-symbolic Self-attention Model (ICML23)☆61Oct 8, 2025Updated 4 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆37Sep 24, 2024Updated last year
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 3 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆17Aug 1, 2025Updated 6 months ago
- ☆21Apr 17, 2025Updated 9 months ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆24Oct 12, 2024Updated last year
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆13Feb 14, 2025Updated last year
- ICLR 2023: Learning to Extrapolate: A Transductive Approach☆11Aug 15, 2023Updated 2 years ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆43Aug 6, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- ☆35Feb 10, 2025Updated last year
- ☆17May 21, 2025Updated 8 months ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated last year
- ☆11Aug 13, 2024Updated last year
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- ☆52Feb 12, 2025Updated last year
- ☆20Oct 22, 2025Updated 3 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 10 months ago
- ☆54Jul 7, 2025Updated 7 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Jul 11, 2025Updated 7 months ago
- An annotated implementation of the Hyena Hierarchy paper☆34May 28, 2023Updated 2 years ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Dec 10, 2024Updated last year
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated 11 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Jun 15, 2024Updated last year
- ☆84Nov 10, 2025Updated 3 months ago
- [AAAI'26] Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression☆19Dec 21, 2025Updated last month
- Collect papers about Mamba (a selective state space model).☆14Aug 6, 2024Updated last year