EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆103Updated last month
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆143Updated 2 months ago
- Reproducing R1 for Code with Reliable Rewards☆243Updated 3 months ago
- ☆126Updated 2 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆172Updated 2 months ago
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆40Updated 8 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆253Updated 10 months ago
- A Comprehensive Benchmark for Software Development.☆111Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆30Updated 5 months ago
- Paper list for Personal LLM Agents☆403Updated last year
- Multimodal Large Language Models for Code Generation under Multimodal Scenarios☆117Updated this week
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆142Updated 2 weeks ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆125Updated 3 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆54Updated last year
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆68Updated 4 months ago
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆142Updated last month
- Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"☆99Updated 2 months ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆52Updated 5 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆171Updated last month
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆86Updated 5 months ago
- ☆25Updated 4 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated 3 months ago
- ☆65Updated 3 months ago
- Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"☆105Updated this week
- ☆71Updated 4 months ago
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆127Updated this week
- PhyX: Does Your Model Have the "Wits" for Physical Reasoning?☆43Updated this week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆121Updated 3 weeks ago
- ☆113Updated 2 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆199Updated 5 months ago
- SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks☆84Updated last month