om-ai-lab/open-agent-leaderboard

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/om-ai-lab/open-agent-leaderboard)

om-ai-lab / open-agent-leaderboard

Reproducible Language Agent Research

☆36

Alternatives and similar repositories for open-agent-leaderboard

Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

openai / azure-cli
View on GitHub
Azure Command-Line Interface
☆14Mar 26, 2026Updated 4 months ago
john-hewitt / implicit-ins
View on GitHub
Codebase for Instruction Following without Instruction Tuning
☆36Sep 24, 2024Updated last year
NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year
aypan17 / reward-misspecification
View on GitHub
☆10Mar 13, 2023Updated 3 years ago
elastic / workplace-search-python
View on GitHub
Elastic Workplace Search Official Python Client
☆10Aug 8, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Kamichanw / CoS
View on GitHub
[ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"
☆30Jun 23, 2025Updated last year
shuaizhao95 / ICLAttack
View on GitHub
ICL backdoor attack
☆17Nov 4, 2024Updated last year
li-xirong / video-retrieval
View on GitHub
Deep Learning for Video Retrieval by Natural Language
☆11Oct 20, 2019Updated 6 years ago
TIGER-AI-Lab / Program-of-Thoughts
View on GitHub
Data and Code for Program of Thoughts [TMLR 2023]
☆317May 15, 2024Updated 2 years ago
THUDM / DataSciBench
View on GitHub
DataSciBench: An LLM Agent Benchmark for Data Science (Findings of ACL 2026)
☆66Jan 21, 2026Updated 6 months ago
qcznlp / uncertainty_attack
View on GitHub
☆23Sep 2, 2025Updated 10 months ago
volkancirik / refer360
View on GitHub
Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"
☆15Jun 26, 2021Updated 5 years ago
uw-nsl / CleanGen
View on GitHub
[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
☆19Mar 9, 2025Updated last year
eddiegaoo / Apt-Serve
View on GitHub
☆21Jun 9, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
divelab / Sys2Bench
View on GitHub
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, l…
☆31Mar 5, 2025Updated last year
Vincent-HKUSTGZ / PEFTGuard
View on GitHub
Official repository for PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning, accepted at 2025 IEEE Symposium on…
☆18Jul 4, 2025Updated last year
tsinghua-fib-lab / DoT
View on GitHub
Official implementation for 'Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient LLM Reasoning'
☆27Feb 18, 2025Updated last year
RLHFlow / GVM
View on GitHub
☆16Jul 29, 2025Updated last year
DeepMathLLM / DeepMath
View on GitHub
一个开源数学大模型项目，旨在探索大模型是否具有数学创造能力，以及大模型在前沿数学研究中的潜在能力。
☆22Mar 19, 2026Updated 4 months ago
SALT-NLP / collaborative-gym
View on GitHub
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆143Apr 30, 2026Updated 2 months ago
OSU-NLP-Group / Explorer
View on GitHub
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
☆29Feb 17, 2026Updated 5 months ago
schelotto / Gaussian_Word_Embedding
View on GitHub
PyTorch implementation of Gaussian word embeddings
☆19Apr 7, 2018Updated 8 years ago
BaichuanSEED / BaichuanSEED.github.io
View on GitHub
Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…
☆18Aug 28, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
zai-org / ComplexFuncBench
View on GitHub
Complex Function Calling Benchmark.
☆180Jan 20, 2025Updated last year
NKU-HLT / MusicEval-baseline
View on GitHub
☆12Apr 18, 2025Updated last year
taishan1994 / baichuan-Qlora-Tuning
View on GitHub
基于qlora对baichuan-7B大模型进行指令微调。
☆22Jun 22, 2023Updated 3 years ago
zzbright1998 / SentenceKV
View on GitHub
Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache com…
☆15Sep 29, 2025Updated 10 months ago
KbsdJames / MATH-Minos
View on GitHub
The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…
☆38Jul 25, 2024Updated 2 years ago
SigmaQuan / Awesome-Chinese-Corpus-Datasets-and-Models
View on GitHub
Awesome Chinese Corpus Datasets and Models.
☆19Oct 28, 2019Updated 6 years ago
open-compass / GPassK
View on GitHub
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆33Aug 5, 2025Updated 11 months ago
Trae1ounG / BuPO
View on GitHub
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
☆60Feb 6, 2026Updated 5 months ago
GeniusHTX / TALE
View on GitHub
☆151Sep 12, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Shikib / Response-Generation-Baselines
View on GitHub
Transformer model for the Amazon Topical-Chat Corpus. Baselines for DSTC9 Track 3.
☆19Jul 9, 2020Updated 6 years ago
Sausage-SONG / Few-shot-action-recognition
View on GitHub
Codes for arXiv paper "Semi-supervised Few-shot Atomic Action Recognition".
☆18Jan 2, 2021Updated 5 years ago
Achillesxu / SpliteDahua-HaikangStreamToES
View on GitHub
get the media stream from Dahua/Haikang IPC SDK, and demux the stream to vedio and audio ES
☆14Nov 15, 2015Updated 10 years ago
horseee / CoT-Valve
View on GitHub
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆91Feb 14, 2025Updated last year
abarbu / objectnet-template-pytorch
View on GitHub
baseline mode for the ObjectNet competition
☆18Jan 13, 2021Updated 5 years ago
formll / resolving-scaling-law-discrepancies
View on GitHub
☆19Nov 4, 2025Updated 8 months ago
ErxinYu / CoSafe-Dataset
View on GitHub
☆13Nov 12, 2024Updated last year