rohinmanvi / Capability-Aware_and_Mid-Generation_Self-EvaluationsLinks

☆21

Alternatives and similar repositories for Capability-Aware_and_Mid-Generation_Self-Evaluations

Users that are interested in Capability-Aware_and_Mid-Generation_Self-Evaluations are comparing it to the libraries listed below

Sorting:

arcee-ai / DAM
☆51Updated 7 months ago
OpenMOSS / Lorsa
☆20Updated last week
tajwarfahim / paprika
Official Code Release for "Training a Generally Curious Agent"
☆25Updated last month
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆57Updated 9 months ago
zjunlp / DynamicKnowledgeCircuits
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
☆36Updated 2 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆68Updated 3 months ago
McGill-NLP / agent-reward-bench
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
☆18Updated last month
vicksEmmanuel / latent-gemma
☆26Updated 5 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆76Updated last year
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆60Updated 2 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆32Updated 3 months ago
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆28Updated 9 months ago
SalesforceAIResearch / LaTRO
☆115Updated 4 months ago
penfever / wildchat-50m
Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.
☆29Updated 2 months ago
SiliangZeng / Multi-Turn-RL-Agent
☆48Updated 2 weeks ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆63Updated 2 months ago
complex-reasoning / RPG
The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆35Updated this week
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆25Updated 3 months ago
allenai / infinigram-api
☆61Updated 3 weeks ago
menhguin / minp_paper
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆38Updated 3 months ago
dinobby / MAgICoRE
☆24Updated 9 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆45Updated 3 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated 3 weeks ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 7 months ago
RobertCsordas / moeut
☆79Updated 10 months ago
ZihanWang314 / coeCheck
☆16Updated 3 months ago
convergence-ai / lm2
Official repo of paper LM2
☆41Updated 4 months ago
tyler-romero / microR1
Simple repository for training small reasoning models
☆33Updated 4 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆93Updated 2 weeks ago
yueqis / API-Based-Agent
☆50Updated 3 weeks ago