rohinmanvi / Capability-Aware_and_Mid-Generation_Self-Evaluations
☆20Updated 5 months ago
Alternatives and similar repositories for Capability-Aware_and_Mid-Generation_Self-Evaluations
Users that are interested in Capability-Aware_and_Mid-Generation_Self-Evaluations are comparing it to the libraries listed below
Sorting:
- ☆18Updated this week
- ☆48Updated 6 months ago
- ☆25Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆12Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- ☆16Updated 2 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- ☆25Updated 4 months ago
- Official Code Release for "Training a Generally Curious Agent"☆20Updated last month
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated 2 months ago
- Lego for GRPO☆28Updated last month
- ☆33Updated 10 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- ☆78Updated 8 months ago
- ☆27Updated this week
- EvaByte: Efficient Byte-level Language Models at Scale☆97Updated 3 weeks ago
- Lottery Ticket Adaptation☆39Updated 5 months ago
- Replicating O1 inference-time scaling laws☆85Updated 5 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated last month
- ☆17Updated 4 months ago
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆32Updated 3 weeks ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- Official repo for Learning to Reason for Long-Form Story Generation☆51Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆85Updated last month
- ☆31Updated 4 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- ☆42Updated last month