Junjie-Ye/RoTBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Junjie-Ye/RoTBench)

Junjie-Ye / RoTBench

[EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

☆15

Alternatives and similar repositories for RoTBench

Users that are interested in RoTBench are comparing it to the libraries listed below

Sorting:

Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆73May 13, 2025Updated 9 months ago
deep-spin / sparse_continuous_distributions
View on GitHub
This repository provides open-source code for sparse continuous distributions and corresponding Fenchel-Young losses.
☆16May 10, 2023Updated 2 years ago
Junjie-Ye / MulDimIF
View on GitHub
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
☆20May 24, 2025Updated 9 months ago
allenai / dream
View on GitHub
☆24Sep 2, 2024Updated last year
laituan245 / EL-Dockers
View on GitHub
☆25Jul 15, 2023Updated 2 years ago
KempnerInstitute / llm_uncertainty
View on GitHub
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆11Apr 15, 2024Updated last year
ZebinYang / exnn
View on GitHub
Enhanced Explainable Neural Network
☆10Dec 25, 2021Updated 4 years ago
gianluigilopardo / smace
View on GitHub
Code for the paper "SMACE: A New Method for the Interpretability of Composite Decision Systems", ECML 2022
☆15Apr 17, 2023Updated 2 years ago
antoinedesir / RUMnet
View on GitHub
☆10Aug 16, 2023Updated 2 years ago
jiaqima / SODEN
View on GitHub
SODEN: A Scalable Continuous-Time Survival Model through Ordinary Differential Equation Networks
☆14Mar 2, 2023Updated 3 years ago
ansonb / FeTA_TMLR
View on GitHub
This repository reproduces the results in the paper "How expressive are transformers in spectral domain for graphs?"(published in TMLR)
☆12Jul 10, 2022Updated 3 years ago
A-star-logic / memoire
View on GitHub
RAG as a service to deploy your AI faster
☆14Apr 14, 2025Updated 10 months ago
LiLabTsinghua / Course-AI-EnzymeDesign
View on GitHub
This is for the AI enzyme design course
☆13Nov 10, 2025Updated 4 months ago
HUBioDataLab / CROssBARv2-KG
View on GitHub
This is the repo for CROssBARv2 Knowledge Graph data. CROssBARv2 is a heterogeneous general-purpose biomedical KG-based system.
☆11Feb 4, 2026Updated last month
QureGenAI-Biotech / TyxonQ
View on GitHub
A General Quantum Software
☆18Feb 24, 2026Updated last week
jacksonloper / leg-gps
View on GitHub
☆10Apr 15, 2022Updated 3 years ago
lindermanlab / warhmm
View on GitHub
☆12Feb 27, 2023Updated 3 years ago
clinicalml / dgm
View on GitHub
Deep Generative Model (Torch)
☆11Apr 19, 2016Updated 9 years ago
fdn-planning / FDN_Model
View on GitHub
Multi-resource Dynamic Coordinated Planning of Flexible Distribution Network
☆15Jun 11, 2024Updated last year
YuanLi95 / GATT-For-Aspect
View on GitHub
☆11Dec 14, 2022Updated 3 years ago
urosolia / MOMDP
View on GitHub
solver for discrete Mixed Observable Markov Decision Processes
☆11Oct 30, 2020Updated 5 years ago
hanshengjiang / Reference_Effects
View on GitHub
Software package for intertemporal pricing optimization under reference effects and consumer heterogeneity estimation. Please see REAMDE.…
☆10Mar 7, 2024Updated 2 years ago
yjchoe / ComparingForecasters
View on GitHub
Comparing sequential forecasters via confidence sequences & e-processes
☆11Oct 24, 2023Updated 2 years ago
DehongXu / grid-cell-rnn
View on GitHub
Official code for Conformal Isometry of Lie Group Representation in Recurrent Network of Grid Cells (NeurIPS workshop on Symmetry and Geo…
☆13Nov 1, 2022Updated 3 years ago
lych1233 / GAMMA-human-ai-collaboration
View on GitHub
☆11Jan 13, 2026Updated last month
starjob42 / Starjob
View on GitHub
JSSP dataset for LLMs
☆16May 29, 2025Updated 9 months ago
pyrobits / DGCluster
View on GitHub
☆13Feb 4, 2025Updated last year
stephmilani / multiagent-viper
View on GitHub
☆10Oct 26, 2022Updated 3 years ago
INESCTEC / TemporalSummarizationFramework
View on GitHub
Temporal summarization framework
☆10Dec 4, 2023Updated 2 years ago
yunzhusong / NAACL2022-REFLECT
View on GitHub
Code for the paper: Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness
☆12Oct 22, 2023Updated 2 years ago
wangyu-ustc / LargeScaleWashing
View on GitHub
The official implementation of the paper "Large Scale Knowledge Washing"
☆10Jun 12, 2024Updated last year
dtak / POPCORN-POMDP
View on GitHub
Implementation of "POPCORN: Partially Observed Prediction Constrained Reinforcement Learning" (Futoma, Hughes, Doshi-Velez, AISTATS 2020)
☆11May 19, 2021Updated 4 years ago
lezhang7 / TreeMix
View on GitHub
[NAACL 2022] TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding
☆10Jul 15, 2023Updated 2 years ago
thashim / population-diffusions
View on GitHub
Code and paper repository for the ICML paper "Learning population level diffusions with generative recurrent networks"
☆11May 26, 2016Updated 9 years ago
zhuye88 / StreaKHC
View on GitHub
A novel incremental hierarchical clustering algorithm (KDD 22)
☆10Aug 31, 2023Updated 2 years ago
BoChenGroup / SawETM
View on GitHub
The demo for "Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network" in ICML 2021
☆11Aug 17, 2021Updated 4 years ago
aditya-grover / bias-correction-generative
View on GitHub
Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting
☆12Mar 24, 2023Updated 2 years ago
mengcz13 / IJCAI2022_ST-KMRN
View on GitHub
Official implementation of "Physics-Informed Long-Sequence Forecasting From Multi-Resolution Spatiotemporal Data".
☆11Dec 12, 2022Updated 3 years ago
HanbaekLyu / NNetwork
View on GitHub
Custom graph/network/multi-weighted network class based on storing list of neighbors for each nodes (as opposed to edge list) for scalabl…
☆12Jan 18, 2024Updated 2 years ago