Research work aimed at addressing the problem of modeling infinite-length context
☆46Dec 18, 2025Updated 2 months ago
Alternatives and similar repositories for long-context-modeling
Users that are interested in long-context-modeling are comparing it to the libraries listed below
Sorting:
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 3 years ago
- ☆20Aug 14, 2025Updated 6 months ago
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- ☆21Jul 3, 2025Updated 8 months ago
- ☆64Apr 9, 2024Updated last year
- ☆22Oct 22, 2024Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆121Jan 27, 2026Updated last month
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 7 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆58Nov 11, 2025Updated 3 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆110Oct 11, 2025Updated 4 months ago
- ☆76Jan 8, 2026Updated last month
- ☆58Sep 2, 2024Updated last year
- ☆37Oct 16, 2025Updated 4 months ago
- Muon fsdp 2☆54Aug 8, 2025Updated 6 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- Official PyTorch Implementation of "Rosetta Neurons: Mining the Common Units in a Model Zoo"☆31Oct 17, 2023Updated 2 years ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆33Sep 28, 2025Updated 5 months ago
- A full-stack online music app, developed using MERN stack (React, Express.js, MongoDB) and Electron. Libraries including Tailwind CSS, Re…☆10Jul 2, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆37May 31, 2025Updated 9 months ago
- Shaping capabilities with token-level pretraining data filtering☆83Jan 28, 2026Updated last month
- ☆41Apr 30, 2025Updated 10 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆90Jan 29, 2024Updated 2 years ago
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆22Feb 9, 2026Updated 3 weeks ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Jan 12, 2026Updated last month
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆44Dec 8, 2024Updated last year
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- ☆10Apr 12, 2025Updated 10 months ago
- ☆17Nov 18, 2025Updated 3 months ago
- Language modeling with linear-cost context☆115Sep 25, 2025Updated 5 months ago
- ☆12Jul 4, 2024Updated last year
- ☆11Aug 20, 2025Updated 6 months ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆58Jun 27, 2025Updated 8 months ago
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆93Feb 4, 2026Updated last month
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆91Feb 23, 2026Updated last week
- BFloat16 Fused Adam Operator for PyTorch☆16Nov 16, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- A library for handling Structural Causal Models and performing interventional and counterfactual inference on them.☆13Jul 3, 2020Updated 5 years ago