[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
☆80Mar 12, 2026Updated last week
Alternatives and similar repositories for HMT-pytorch
Users that are interested in HMT-pytorch are comparing it to the libraries listed below
Sorting:
- [FPGA 2024] Source code and bitstream for LevelST: Stream-based Accelerator for Sparse Triangular Solver☆15Jun 1, 2025Updated 9 months ago
- Open-source AI acceleration on FPGA: from ONNX to RTL☆49Updated this week
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆53Aug 28, 2024Updated last year
- ICLR 2023: Learning to Extrapolate: A Transductive Approach☆11Aug 15, 2023Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Hrrformer: A Neuro-symbolic Self-attention Model (ICML23)☆62Oct 8, 2025Updated 5 months ago
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆15May 10, 2024Updated last year
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆84Nov 10, 2025Updated 4 months ago
- Differentiable Clustering with Perturbed Random Forests, NeurIPS2023☆13Oct 16, 2023Updated 2 years ago
- Gradient-based Hyperparameter Optimization Over Long Horizons☆14Sep 29, 2021Updated 4 years ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆24Oct 12, 2024Updated last year
- Official implementation for Neural networks with recurrent generative feedback (NeurIPS 2020).☆22Nov 10, 2020Updated 5 years ago
- Hybrid Deep Sequential Modeling for Social Text-Driven Stock Prediction-Dataset☆22Aug 19, 2018Updated 7 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆13Feb 14, 2025Updated last year
- ☆17Aug 1, 2025Updated 7 months ago
- Code for ICML 2024 paper☆35Sep 18, 2025Updated 6 months ago
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 7 months ago
- ☆107Mar 9, 2024Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Jun 15, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- ☆16Dec 9, 2023Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year
- Adaptation of titans-pytorch to llama models on HF☆25Mar 6, 2025Updated last year
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- ☆42Mar 28, 2024Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 11 months ago
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration (Full Paper Accepted in FPGA'24)☆36Mar 12, 2026Updated last week
- Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"☆48Jul 29, 2025Updated 7 months ago