EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆104Updated 3 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆150Updated 2 weeks ago
- A Comprehensive Benchmark for Software Development.☆115Updated last year
- ☆87Updated 6 months ago
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆40Updated 10 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆197Updated 4 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆130Updated 8 months ago
- Paper list for Personal LLM Agents☆412Updated last year
- Reproducing R1 for Code with Reliable Rewards☆259Updated 5 months ago
- ☆135Updated this week
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆190Updated 4 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆257Updated last year
- ☆133Updated last month
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆91Updated 7 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆32Updated 8 months ago
- ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry☆33Updated 3 weeks ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆170Updated last week
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆55Updated 7 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆111Updated last month
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆133Updated 6 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆55Updated last year
- Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"☆106Updated 5 months ago
- Repo for EmbedLLM: Learning Compact Representations of Large Language Models☆21Updated 3 weeks ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆170Updated last year
- [NAACL 2025 Main Selected Oral] Repository for the paper: Prompt Compression for Large Language Models: A Survey☆31Updated 5 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆298Updated 3 weeks ago
- ☆46Updated 10 months ago
- GitHub page for "Large Language Model-Brained GUI Agents: A Survey"☆199Updated 3 months ago
- A Comprehensive Survey on Long Context Language Modeling☆192Updated 3 months ago
- The repo for In-context Autoencoder☆145Updated last year
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Updated 5 months ago