Aloriosa / srmt
The original Shared Recurrent Memory Transformer implementation
☆24Updated 3 months ago
Alternatives and similar repositories for srmt:
Users that are interested in srmt are comparing it to the libraries listed below
- ☆17Updated 2 months ago
- ☆9Updated 2 weeks ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆55Updated 2 months ago
- ☆17Updated last week
- ☆27Updated 3 weeks ago
- ☆63Updated last month
- Small, simple agent task environments for training and evaluation☆18Updated 6 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago
- ☆65Updated 3 weeks ago
- ☆11Updated 9 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 11 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆18Updated 7 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 2 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆12Updated 3 weeks ago
- ☆48Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆64Updated 2 months ago
- ☆13Updated 4 months ago
- Agentic Knowledgeable Self-awareness☆56Updated 3 weeks ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- ☆25Updated 7 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated 2 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated last month
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆52Updated 3 months ago
- accompanying material for sleep-time compute paper☆77Updated last week
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆16Updated 2 months ago
- Official repo of paper LM2☆39Updated 2 months ago