Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
☆67Nov 11, 2025Updated 7 months ago
Alternatives and similar repositories for retrofitting-recurrence
Users that are interested in retrofitting-recurrence are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repo for paper "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"☆108Apr 23, 2026Updated 2 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆30Aug 19, 2025Updated 10 months ago
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- Transformers components but in Triton☆34May 9, 2025Updated last year
- PyTorch Implementation of Zero-Shot Vision Encoder Grafting via LLM Surrogates [ICCV'25]☆54Jul 10, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆58Mar 31, 2026Updated 3 months ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆35Sep 28, 2025Updated 9 months ago
- Official Repository of "Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Ste…☆28Mar 9, 2026Updated 3 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆22Mar 15, 2025Updated last year
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆19Oct 4, 2025Updated 9 months ago
- Code and data for paper "(How) do Language Models Track State?"☆25Mar 31, 2025Updated last year
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Research work aimed at addressing the problem of modeling infinite-length context☆49Dec 18, 2025Updated 6 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Diagnostic Framework for LLMs and MLLMs☆39Mar 2, 2026Updated 4 months ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆23Jun 15, 2025Updated last year
- Programmatic access to your CMDOP fleet from Python and Node☆43Jun 17, 2026Updated 2 weeks ago
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.☆21Jul 18, 2025Updated 11 months ago
- [NeurIPS'25] ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions☆47Dec 7, 2025Updated 6 months ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆75Jun 15, 2026Updated 2 weeks ago
- Performance analysis of predictive (alpha) factors☆24Jul 25, 2025Updated 11 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆894Dec 29, 2025Updated 6 months ago
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆52Nov 8, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆32Aug 21, 2025Updated 10 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆52Nov 9, 2024Updated last year
- ☆129Jun 2, 2026Updated last month
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 8 months ago
- The evaluation framework for training-free sparse attention in LLMs☆124Jan 27, 2026Updated 5 months ago
- A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid …☆29Jun 22, 2026Updated last week
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆11Feb 13, 2024Updated 2 years ago
- ☆71Jun 18, 2025Updated last year
- [ICML'26] Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"☆73Apr 4, 2026Updated 3 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 9 months ago
- Quora Paraphrasing Dataset Bahasa Indonesia Version☆11Apr 18, 2021Updated 5 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 11 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆136Jan 30, 2026Updated 5 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆23Jun 26, 2023Updated 3 years ago
- defaultMODE is a Python framework for creating Discord AI agents with persistent memory and evolving behavior through brain-inspired sele…☆13Apr 21, 2026Updated 2 months ago
- ☆12Jun 15, 2023Updated 3 years ago