rohinmanvi / Capability-Aware_and_Mid-Generation_Self-Evaluations
☆20Updated 3 months ago
Alternatives and similar repositories for Capability-Aware_and_Mid-Generation_Self-Evaluations:
Users that are interested in Capability-Aware_and_Mid-Generation_Self-Evaluations are comparing it to the libraries listed below
- ☆48Updated 4 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆22Updated last week
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆78Updated 2 weeks ago
- ☆74Updated 7 months ago
- ☆24Updated 6 months ago
- ☆16Updated 3 weeks ago
- ☆111Updated last month
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆28Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆62Updated 9 months ago
- ☆42Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated this week
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 2 weeks ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- ☆20Updated 9 months ago
- entropix style sampling + GUI☆25Updated 4 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆80Updated last month
- ☆32Updated 9 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆48Updated last month
- Lottery Ticket Adaptation☆38Updated 4 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆25Updated last month
- ☆27Updated this week
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 2 weeks ago
- ☆30Updated 2 months ago