heaplax / ARMAPLinks
☆25Updated 2 months ago
Alternatives and similar repositories for ARMAP
Users that are interested in ARMAP are comparing it to the libraries listed below
Sorting:
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆104Updated 2 months ago
- Natural Language Reinforcement Learning☆92Updated last week
- ☆60Updated 5 months ago
- ☆47Updated 5 months ago
- Reinforced Multi-LLM Agents training☆35Updated 2 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆59Updated 6 months ago
- A repo for open research on building large reasoning models☆87Updated this week
- Official Repository of LatentSeek☆56Updated 2 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆114Updated 4 months ago
- ☆48Updated 2 months ago
- ☆114Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 6 months ago
- ☆197Updated this week
- ☆53Updated 2 months ago
- ☆43Updated 5 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆185Updated 3 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆99Updated 5 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆40Updated last month
- ☆323Updated last week
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆78Updated last month
- repo for paper https://arxiv.org/abs/2504.13837☆180Updated last month
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆78Updated 4 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆147Updated 9 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆161Updated 4 months ago
- Official implementation of the paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"☆204Updated 2 weeks ago
- ☆81Updated 2 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 9 months ago