MurrayTom / ToolSafeLinks
Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedback"
☆26Updated last week
Alternatives and similar repositories for ToolSafe
Users that are interested in ToolSafe are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Updated last week
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆27Updated 2 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆23Updated last week
- The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".☆78Updated last week
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆29Updated 3 months ago
- Code for paper: Optimizing Length Compression in Large Reasoning Models☆27Updated 3 months ago
- [NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model☆64Updated 3 months ago
- Scaling Agentic Environments Automatically.☆47Updated last week
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning☆71Updated 8 months ago
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆57Updated this week
- ☆36Updated 3 months ago
- This is the code repo for the paper "Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning".☆33Updated 5 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆25Updated 5 months ago
- [FSE'2026] SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks☆138Updated this week
- Scaling Long-Horizon LLM Agent via Context-Folding☆101Updated this week
- The demo, code and data of FollowRAG☆75Updated 7 months ago
- ☆177Updated last month
- The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"☆23Updated 7 months ago
- ☆36Updated last week
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆44Updated last year
- MemEvolve & EvolveLab☆148Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 7 months ago
- [ACL 2025] Knowledge Unlearning for Large Language Models☆47Updated 4 months ago
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches☆35Updated 3 months ago
- ☆33Updated 6 months ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Updated 7 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆135Updated 4 months ago
- ☆46Updated 3 months ago
- This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.☆46Updated 5 months ago