[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
☆80Mar 12, 2026Updated 2 months ago
Alternatives and similar repositories for HMT-pytorch
Users that are interested in HMT-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [FPGA 2024] Source code and bitstream for LevelST: Stream-based Accelerator for Sparse Triangular Solver☆15Jun 1, 2025Updated last year
- Open-source AI acceleration on FPGA: from ONNX to RTL☆54Updated this week
- ICLR 2023: Learning to Extrapolate: A Transductive Approach☆11Aug 15, 2023Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆57Aug 20, 2024Updated last year
- Hrrformer: A Neuro-symbolic Self-attention Model (ICML23)☆63Oct 8, 2025Updated 8 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated 2 years ago
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆15May 10, 2024Updated 2 years ago
- A repository for research on medium sized language models.☆78May 23, 2024Updated 2 years ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆40Sep 24, 2024Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆32Apr 8, 2024Updated 2 years ago
- ☆84Jun 2, 2026Updated last week
- Training Recipes for Agentic Reinforcement Learning in LLMs: A Survey☆36Jan 30, 2026Updated 4 months ago
- Differentiable Clustering with Perturbed Random Forests, NeurIPS2023☆13Oct 16, 2023Updated 2 years ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆45Aug 6, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Gradient-based Hyperparameter Optimization Over Long Horizons☆14Sep 29, 2021Updated 4 years ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆27Oct 12, 2024Updated last year
- Official implementation for Neural networks with recurrent generative feedback (NeurIPS 2020).☆22Nov 10, 2020Updated 5 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆32Feb 25, 2025Updated last year
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆13Feb 14, 2025Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated 2 years ago
- An Open-Hardware CGRA for accelerated computation on the edge.☆52Oct 31, 2025Updated 7 months ago
- ☆17Aug 1, 2025Updated 10 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆25Jun 6, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 10 months ago
- ☆107Mar 9, 2024Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆31Nov 12, 2024Updated last year
- Code for ICML 2024 paper☆36Sep 18, 2025Updated 8 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆117Jun 15, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- ☆16Dec 9, 2023Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- Adaptation of titans-pytorch to llama models on HF☆24Mar 6, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year
- ☆42Mar 28, 2024Updated 2 years ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆156Apr 7, 2025Updated last year
- SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration (Full Paper Accepted in FPGA'24)☆36Mar 12, 2026Updated 2 months ago
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago