Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
☆60Nov 11, 2025Updated 5 months ago
Alternatives and similar repositories for retrofitting-recurrence
Users that are interested in retrofitting-recurrence are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆29Aug 19, 2025Updated 7 months ago
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- Transformers components but in Triton☆34May 9, 2025Updated 11 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆56Mar 31, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- PyTorch Implementation of Zero-Shot Vision Encoder Grafting via LLM Surrogates [ICCV'25]☆53Jul 10, 2025Updated 9 months ago
- Official Repository of "Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Ste…☆27Mar 9, 2026Updated last month
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆34Sep 28, 2025Updated 6 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated last year
- Repo for paper "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"☆76Updated this week
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 6 months ago
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Research work aimed at addressing the problem of modeling infinite-length context☆48Dec 18, 2025Updated 3 months ago
- Diagnostic Framework for LLMs and MLLMs☆36Mar 2, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆92Mar 23, 2026Updated 3 weeks ago
- [NeurIPS'25] ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions☆34Dec 7, 2025Updated 4 months ago
- Python SDK for CMDOP agent interaction☆41Apr 7, 2026Updated last week
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.☆22Jul 18, 2025Updated 8 months ago
- Official reposity for paper "High-Dimension Human Value Representation in Large Language Models" (NAACL'25 Main)☆23Jul 9, 2024Updated last year
- Pretraining and inference code for a large-scale depth-recurrent language model☆870Dec 29, 2025Updated 3 months ago
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆51Nov 8, 2024Updated last year
- ☆30Aug 21, 2025Updated 7 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆51Nov 9, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 5 months ago
- The evaluation framework for training-free sparse attention in LLMs☆122Jan 27, 2026Updated 2 months ago
- ☆41Dec 9, 2025Updated 4 months ago
- ☆27Sep 28, 2024Updated last year
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆11Feb 13, 2024Updated 2 years ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆132Jan 30, 2026Updated 2 months ago
- ☆57Updated this week
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆74Apr 7, 2026Updated last week
- Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"☆68Apr 4, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Quora Paraphrasing Dataset Bahasa Indonesia Version☆11Apr 18, 2021Updated 4 years ago
- ☆11Dec 15, 2025Updated 3 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- ☆12Jun 15, 2023Updated 2 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- Cryptocurrency Design and Engineering class Fall 2025☆57Feb 19, 2026Updated last month
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago