The official repo for our paper: LegalAgentBench: Evaluating LLM Agents in Legal Domainl
☆43Dec 30, 2024Updated last year
Alternatives and similar repositories for LegalAgentBench
Users that are interested in LegalAgentBench are comparing it to the libraries listed below
Sorting:
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 7 months ago
- Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs (EMNLP 2024)☆16Nov 17, 2024Updated last year
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models☆18Jan 18, 2025Updated last year
- StaRD: Statute Retrieval Dataset based on Real-World Legal Consultation☆20Apr 24, 2025Updated 10 months ago
- A general framework used on evaluating the performance of large language models (LLMs) based on the peer review mechanism among LLMs☆19Aug 3, 2024Updated last year
- Code for JuDGE, SIGIR 2025 Long Paper☆32Aug 7, 2025Updated 6 months ago
- A Survey of Multimodal Retrieval-Augmented Generation☆20Nov 3, 2025Updated 4 months ago
- LexEval: A Comprehensive Benchmark for Evaluating Large Language Models in Legal Domain☆90Oct 30, 2024Updated last year
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆38Jun 23, 2025Updated 8 months ago
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆35Mar 3, 2025Updated last year
- A platform for building reliable AI agents☆90Feb 19, 2026Updated last week
- Official code space for "SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development"☆61Oct 24, 2025Updated 4 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆130Mar 18, 2025Updated 11 months ago
- Test-time compute in information retrieval☆54Jul 8, 2025Updated 7 months ago
- ☆13Aug 12, 2022Updated 3 years ago
- 基于区块链的商品溯源系统☆10Mar 11, 2021Updated 4 years ago
- ☆10May 19, 2024Updated last year
- Repository for the paper: "Using deep learning to predict outcomes of legal appeals better than human experts"☆10Aug 1, 2022Updated 3 years ago
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- ☆11Jul 21, 2024Updated last year
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- ☆26Jul 29, 2025Updated 7 months ago
- Official codebase for NeurIPS 2022 paper End-to-end Learning to Index and Search in Large Output Spaces☆12Apr 19, 2023Updated 2 years ago
- ☆11Jan 21, 2024Updated 2 years ago
- ☆13Sep 26, 2024Updated last year
- ☆12Jan 7, 2020Updated 6 years ago
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year
- Universal LLM security auditor with automated jailbreak testing, DSPy optimization, and OWASP 2025-aligned attack patterns☆21Oct 23, 2025Updated 4 months ago
- ☆10Oct 6, 2021Updated 4 years ago
- Experimental tl;dr summaries for datasets on the Hugging Face Hub!☆10Apr 4, 2024Updated last year
- ⚙️ Lightweight & smart Bun & Browser configuration loader.☆15Updated this week
- Explanation of the llama2 repo.☆12Jul 18, 2024Updated last year
- [NeurIPS 2024 poster] Cross-model Control: Improving Multiple Large Language Models in One-time Training☆14Oct 25, 2024Updated last year
- A Benchmark for Multi-Stage Legal Case Documents Generation☆15Feb 24, 2025Updated last year
- IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)☆14Jul 14, 2025Updated 7 months ago
- Reasoning-based Evaluation and Ranking of Translations.☆19Jul 18, 2025Updated 7 months ago
- ☆101Feb 15, 2026Updated 2 weeks ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆53Jun 6, 2025Updated 8 months ago
- ☆12Nov 14, 2024Updated last year