Top papers related to LLM-based agent evaluation
☆90Oct 21, 2025Updated 6 months ago
Alternatives and similar repositories for LLM-Agent-Evaluation-Survey
Users that are interested in LLM-Agent-Evaluation-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.☆21Jan 22, 2026Updated 3 months ago
- Official implementation of "Dataset Size Recovery from LoRA Weights" paper.☆34Jun 30, 2024Updated last year
- Official PyTorch Implementation for the "Unsupervised Model Tree Heritage Recovery" paper (ICLR 2025).☆63Jul 1, 2025Updated 10 months ago
- Code release for "Time Series Anomaly Detection by Cumulative Radon Features"☆12Feb 8, 2022Updated 4 years ago
- Official PyTorch Implementation for the "Recovering the Pre-Fine-Tuning Weights of Generative Models" paper (ICML 2024).☆86Apr 15, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆216Sep 18, 2025Updated 7 months ago
- This repo contains the official PyTorch implementation of "A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement" (…☆28Aug 8, 2022Updated 3 years ago
- Official PyTorch Implementation for the "Distilling Datasets Into Less Than One Image" paper.☆39Jun 6, 2024Updated last year
- The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)☆49Aug 15, 2025Updated 8 months ago
- A Lossless Compression Library for AI pipelines☆314Apr 11, 2026Updated 3 weeks ago
- Official Implementation for the "Back to the Feature: Classical 3D Features are (Almost) All You Need for 3D Anomaly Detection" paper (VA…☆139Nov 28, 2022Updated 3 years ago
- ☆28Feb 11, 2026Updated 2 months ago
- Repository for "Attribute First, then Generate: Locally-attributable Grounded Text Generation", ACL 2024☆30Dec 19, 2024Updated last year
- Awesome AI Benchmarks☆31Jan 16, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]☆408Jun 8, 2025Updated 11 months ago
- Official PyTorch implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion (ACL 2026)☆17Apr 14, 2026Updated 3 weeks ago
- The official repo of "WhiStress: Enriching Transcriptions with Sentence Stress Detection" (Interspeech 2025)☆37Jul 24, 2025Updated 9 months ago
- jQuery, React and Streamlit applications written by LLMs☆15Dec 24, 2023Updated 2 years ago
- Code repository for CISO agent as part of ITBench☆20May 8, 2025Updated last year
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆131Dec 8, 2023Updated 2 years ago
- ☆10Jan 31, 2026Updated 3 months ago
- This repository contains a user-friendly Graphical User Interface (GUI) for interacting with the Hebrew-Mistral-7B language model.☆15May 3, 2024Updated 2 years ago
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated 2 years ago
- Experiments on using ChatGPT for failure mode classification☆12Sep 20, 2023Updated 2 years ago
- Survey paper: From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents.☆48Apr 3, 2026Updated last month
- Code that translates grammar into PDDL, runs a planner to produce multiple plans, translates plans into trainable lale pipelines and trai…☆18Sep 17, 2025Updated 7 months ago
- Official implementation of paper 'Fair Feature Distillation for Visual Recognition'☆17Jun 23, 2021Updated 4 years ago
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆23Mar 14, 2024Updated 2 years ago
- A proxy for minimax-m2, enabling interleaved thinking, and tool calls.☆39Nov 21, 2025Updated 5 months ago
- Implementation of KDR-Agent, the AAAI 2025 accepted paper, focusing on knowledge-driven reasoning for autonomous agents.☆18Nov 24, 2025Updated 5 months ago
- The predecessor of CiteLab.☆18Feb 3, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- MLflow deployment plugin For IBM-cloud-watson-ml☆15May 7, 2025Updated last year
- Codebase for EnterpriseOps-Gym from ServiceNow☆83Apr 30, 2026Updated last week
- make logging fun again☆20Apr 9, 2017Updated 9 years ago
- ANAC Supply Chain Management League Development Environment☆11Updated this week
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 6 months ago
- Source code for paper "Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration" of NeurIPS 2019☆10Jan 25, 2024Updated 2 years ago
- ☆36Jun 10, 2024Updated last year