UbiquantAI / URMLinks
Universal Reasoning Model
☆122Updated 3 weeks ago
Alternatives and similar repositories for URM
Users that are interested in URM are comparing it to the libraries listed below
Sorting:
- ☆91Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆115Updated 9 months ago
- Esoteric Language Models☆111Updated this week
- [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective☆231Updated 2 weeks ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆292Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆128Updated 4 months ago
- Official repo of paper LM2☆46Updated 11 months ago
- ☆26Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- Simple repository for training small reasoning models☆49Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated 2 weeks ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆86Updated 4 months ago
- ☆394Updated last week
- Repository for the paper Stream of Search: Learning to Search in Language☆153Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- ☆123Updated 11 months ago
- ☆100Updated last week
- GoldFinch and other hybrid transformer components☆45Updated last year
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆65Updated last week
- ☆112Updated last year
- Fluid Language Model Benchmarking☆26Updated 4 months ago
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated 3 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆56Updated 2 months ago
- ☆56Updated last year
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆78Updated last week
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Updated last month
- 📄Small Batch Size Training for Language Models☆80Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆186Updated 3 weeks ago