[EMNLP 2024] A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.
☆22Sep 23, 2024Updated last year
Alternatives and similar repositories for ToolBeHonest
Users that are interested in ToolBeHonest are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- ☆21Aug 19, 2024Updated last year
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆66Dec 8, 2024Updated last year
- Implementation of LREC-COLING 2024 paper A Frustratingly Simple Decoding Method for Neural Text Generation☆19Feb 23, 2024Updated 2 years ago
- [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆131Dec 19, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆29May 24, 2024Updated 2 years ago
- SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models☆17Jun 24, 2024Updated 2 years ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆51Dec 15, 2023Updated 2 years ago
- Implementation of NAACL 2024 paper Unveiling the Generalization Power of Fine-Tuned Large Language Models☆11Mar 14, 2024Updated 2 years ago
- ☆28Apr 19, 2026Updated 2 months ago
- ☆21Nov 26, 2024Updated last year
- Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts☆17Sep 2, 2024Updated last year
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- ☆32Jun 5, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Python client library for Cleanlab Trustworthy Language Model☆24Dec 9, 2025Updated 6 months ago
- Safety-J: Evaluating Safety with Critique☆16Jul 28, 2024Updated last year
- codes for "Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models"☆12Feb 10, 2025Updated last year
- ☆16Sep 27, 2023Updated 2 years ago
- Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'☆45Updated this week
- ☆15Apr 22, 2024Updated 2 years ago
- fastNLP reimplementation of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction"☆11Dec 11, 2020Updated 5 years ago
- Code for paper "Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication"☆23Mar 30, 2024Updated 2 years ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆41Jun 24, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆24Jul 28, 2024Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 11 months ago
- [ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO☆65Apr 30, 2025Updated last year
- [NeurIPS 2024 poster] Cross-model Control: Improving Multiple Large Language Models in One-time Training☆14Oct 25, 2024Updated last year
- This is the repository containing the solution of the homework for the CS224W course at Stanford: Machine Learning with Graphs☆11Jul 19, 2020Updated 5 years ago
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- Zero-shot Learning by Generating Task-specific Adapters☆14Apr 2, 2021Updated 5 years ago
- Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data☆43May 29, 2024Updated 2 years ago
- ☆31Oct 20, 2025Updated 8 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Zero-shot evaluation on LEXGLUE tasks with GTP3.5☆29Mar 11, 2023Updated 3 years ago
- ☆20Aug 31, 2022Updated 3 years ago
- Repository for the Exposing Outlier Exposure paper☆12Aug 20, 2024Updated last year
- ALBench Leaderboard for active learning in object detection☆15Jan 13, 2023Updated 3 years ago
- Interpretable unified language safety checking with large language models☆32Apr 15, 2023Updated 3 years ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆23Mar 11, 2024Updated 2 years ago
- ☆25Jan 29, 2026Updated 5 months ago