zlwang-cs/OfficeBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zlwang-cs/OfficeBench)

zlwang-cs / OfficeBench

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

☆41

Alternatives and similar repositories for OfficeBench

Users that are interested in OfficeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

harrytea / TGDoc
View on GitHub
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Nov 28, 2024Updated last year
agentsea / osuniverse
View on GitHub
Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents
☆24May 7, 2025Updated last year
EvanZhuang / knowledge_flow
View on GitHub
Official Implementation of Knowledge Flow Prompting
☆35Oct 20, 2025Updated 9 months ago
scaleapi / PRBench
View on GitHub
Open source codebase for PRBench
☆18Jan 15, 2026Updated 6 months ago
agential-ai / agential
View on GitHub
🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
☆54Jul 9, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Hritikbansal / entigen_emnlp
View on GitHub
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?
☆13Aug 16, 2023Updated 2 years ago
D3Mlab / cr-lt-kgqa
View on GitHub
CR-LT KGQA Dataset Repository
☆10Jun 1, 2025Updated last year
google-research-datasets / QuoteSum
View on GitHub
QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …
☆13Mar 25, 2024Updated 2 years ago
biergaizi / KonaChanWallpaper
View on GitHub
Fetch a random wallpaper from Konachan.
☆10Jun 4, 2018Updated 8 years ago
DavideBuffelli / SizeShiftReg
View on GitHub
Code for the paper "SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks"
☆12Jan 17, 2023Updated 3 years ago
WadeYin9712 / GeoMLAMA
View on GitHub
☆15Oct 24, 2022Updated 3 years ago
WadeYin9712 / UI-Simulator
View on GitHub
Code for 🌍 UI-Simulator: LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
☆21Oct 17, 2025Updated 9 months ago
xxxiaol / spatial-commonsense
View on GitHub
Source code and data for Things not Written in Text: Exploring Spatial Commonsense from Visual Signals (ACL2022 main conference paper).
☆20Oct 10, 2022Updated 3 years ago
jm199504 / Paper-Notes
View on GitHub
个人论文研读笔记梳理（GNN / Imaing / CNN / PaperRobot / FinancialTimeSeries）
☆15Jul 30, 2019Updated 6 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
didiforgithub / SwarmAgent
View on GitHub
🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…
☆13Dec 5, 2023Updated 2 years ago
magicgh / Ask-before-Plan
View on GitHub
[EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning
☆24Jul 28, 2025Updated 11 months ago
0xWelt / VibeRL
View on GitHub
VibeRL is a Reinforcement Learning framework built essentially through vibe coding with Kimi K2.
☆17Updated this week
zlwang-cs / LASER-release
View on GitHub
Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework
☆14May 31, 2023Updated 3 years ago
xxxiaol / magic-if
View on GitHub
Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…
☆31Jun 4, 2023Updated 3 years ago
thoughttrace-project / ThoughtTrace
View on GitHub
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
☆15Jun 28, 2026Updated 3 weeks ago
atfortes / LLMSymbolicReasoningBench
View on GitHub
Synthetic data generation for evaluating LLM symbolic and logic reasoning
☆23Mar 6, 2026Updated 4 months ago
ZJU-ACES-ISE / ChatUITest
View on GitHub
Under construction
☆14Jan 15, 2025Updated last year
StarWalkin / UI-NEXUS
View on GitHub
This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…
☆14Jul 27, 2025Updated 11 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
kohjingyu / multi-agent-computer-use
View on GitHub
Code for the multi-agent computer use project.
☆21Jul 3, 2026Updated 3 weeks ago
KomeijiForce / EmojiLM
View on GitHub
Official Implementation for "EmojiLM: Modeling the New Emoji Language"
☆12Feb 23, 2024Updated 2 years ago
TruthfulAI-research / negation_neglect
View on GitHub
Code for Negation Neglect
☆16May 22, 2026Updated 2 months ago
cbouilla / pcg
View on GitHub
Prediction algorithms for the PCG pseudo-random generator
☆15Nov 13, 2020Updated 5 years ago
yaof20 / ReaL
View on GitHub
Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"
☆42Jul 21, 2025Updated last year
hamishivi / EasyLM
View on GitHub
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆78Aug 17, 2024Updated last year
dlr-ve-esy / amiris
View on GitHub
Agent-based Market model for the Investigation of Renewable and Integrated energy Systems (Official GitLab Mirror)
☆19Updated this week
yale-nlp / ODSum
View on GitHub
Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"
☆11Sep 20, 2024Updated last year
modestyachts / cifar-10.2
View on GitHub
Host CIFAR-10.2 Data Set
☆13Sep 22, 2021Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
bpwu1 / confidence-regulation-neurons
View on GitHub
Confidence Regulation Neurons in Language Models (NeurIPS 2024)
☆15Feb 1, 2025Updated last year
aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
init0xyz / AdaCQR
View on GitHub
Implementation of AdaCQR(COLING 2025)
☆15Dec 30, 2024Updated last year
WadeYin9712 / GD-VCR
View on GitHub
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).
☆29Sep 4, 2021Updated 4 years ago
snumprlab / capeam
View on GitHub
Official Implementation of CAPEAM (ICCV'23)
☆16Nov 30, 2024Updated last year
minecraft-saar / autoplanbench
View on GitHub
☆21Apr 27, 2026Updated 2 months ago
yzhao062 / auditable
View on GitHub
Audit any agent decision across its past, present, and future, on one typed graph.
☆16Updated this week