Asaf-Yehudai / LLM-Agent-Evaluation-SurveyLinks
Top papers related to LLM-based agent evaluation
☆68Updated 2 weeks ago
Alternatives and similar repositories for LLM-Agent-Evaluation-Survey
Users that are interested in LLM-Agent-Evaluation-Survey are comparing it to the libraries listed below
Sorting:
- Repository for "Attribute First, then Generate: Locally-attributable Grounded Text Generation", ACL 2024☆29Updated 5 months ago
- A package dedicated for running benchmark agreement testing☆16Updated 3 weeks ago
- ☆65Updated 2 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆61Updated last week
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- ☆58Updated 3 weeks ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Official implementation of "Dataset Size Recovery from LoRA Weights" paper.☆33Updated 11 months ago
- accompanying material for sleep-time compute paper☆90Updated last month
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation☆31Updated 3 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated 10 months ago
- Aioli: A unified optimization framework for language model data mixing☆25Updated 4 months ago
- Exploration of automated dataset selection approaches at large scales.☆41Updated 3 months ago
- ☆34Updated last week
- ☆45Updated 2 weeks ago
- ☆24Updated 8 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆110Updated 2 weeks ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆89Updated last week
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆201Updated last month
- Source code for the collaborative reasoner research project at Meta FAIR.☆87Updated last month
- ☆83Updated 3 weeks ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆87Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- PyTorch library for Active Fine-Tuning☆77Updated 3 months ago
- ☆89Updated last week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated last month
- ☆68Updated 9 months ago