OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
β36Apr 1, 2026Updated last month
Alternatives and similar repositories for OfficeBench
Users that are interested in OfficeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β54Jul 9, 2025Updated 10 months ago
- Source code and data for Counterfactual Recipe Generation: Exploring Modelsβ Compositional Generalization Ability in a Realistic Scenarioβ¦β15Oct 25, 2022Updated 3 years ago
- How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?β13Aug 16, 2023Updated 2 years ago
- β21Apr 27, 2026Updated 3 weeks ago
- Code and data for the USENIX 2025 paper "We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generatingβ¦β28Aug 12, 2025Updated 9 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Source code and data for Things not Written in Text: Exploring Spatial Commonsense from Visual Signals (ACL2022 main conference paper).β20Oct 10, 2022Updated 3 years ago
- [EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planningβ23Jul 28, 2025Updated 9 months ago
- Code for AAAI20 paper "Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning"β16Apr 3, 2020Updated 6 years ago
- AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agentsβ57Jan 28, 2025Updated last year
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Frameworkβ14May 31, 2023Updated 2 years ago
- Official Implementation for "EmojiLM: Modeling the New Emoji Language"β12Feb 23, 2024Updated 2 years ago
- Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"β11Sep 20, 2024Updated last year
- Koishi's Day 2025 Paper (NeurIPS 2025): "Codifying Character Logic in Role-Playing"β23Jan 15, 2026Updated 4 months ago
- β12Nov 3, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Implementation of AdaCQR(COLING 2025)β15Dec 30, 2024Updated last year
- Official PyTorch implementation of RefRef: A Synthetic Dataset and Benchmark for Reconstructing Refractive and Reflective Objectsβ15Mar 2, 2026Updated 2 months ago
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.β17Aug 30, 2022Updated 3 years ago
- Official implementation of SimFlowβ31Dec 16, 2025Updated 5 months ago
- Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)β18Nov 24, 2022Updated 3 years ago
- β27Oct 19, 2024Updated last year
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"β477Mar 19, 2024Updated 2 years ago
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"β10Mar 8, 2024Updated 2 years ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"β70Dec 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β11Sep 24, 2024Updated last year
- ππ£οΈπ‘πΎπ A framework for navigation tasks that can build the 3D scene graph in real-time and utilize large language model (LLM) to guiβ¦β26Oct 14, 2024Updated last year
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Dataβ48Feb 18, 2025Updated last year
- Official Implementation of the ACL2024 Findings paper "Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attrβ¦β18May 18, 2024Updated 2 years ago
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agentsβ17Oct 12, 2024Updated last year
- β14May 9, 2024Updated 2 years ago
- Open source code for paperβ15May 27, 2024Updated last year
- A Chrome/Edge extension to help you quickly scan through the flood of daily ArXiv papers.β15Mar 29, 2025Updated last year
- [ICLR 2024 Spotlight] Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environmentsβ20Aug 19, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- DREEM Relates Every Entities' Motion (DREEM). Global Tracking Transformers for biological multi-object tracking.β17Mar 23, 2026Updated 2 months ago
- β35Aug 17, 2025Updated 9 months ago
- CIFAR-10-Warehouse: Towards Broad and More Realistic Testbeds in Model Generalization Analysisβ18Jul 15, 2024Updated last year
- Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)β20Nov 8, 2023Updated 2 years ago
- β17Nov 3, 2024Updated last year
- Neural network backend for training and inference for animal pose estimation.β20Updated this week
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Modelsβ84Nov 27, 2025Updated 5 months ago